David Brazdil | 0f672f6 | 2019-12-10 10:32:29 +0000 | [diff] [blame^] | 1 | =============================================== |
| 2 | The Linux WatchDog Timer Driver Core kernel API |
| 3 | =============================================== |
| 4 | |
| 5 | Last reviewed: 12-Feb-2013 |
| 6 | |
| 7 | Wim Van Sebroeck <wim@iguana.be> |
| 8 | |
| 9 | Introduction |
| 10 | ------------ |
| 11 | This document does not describe what a WatchDog Timer (WDT) Driver or Device is. |
| 12 | It also does not describe the API which can be used by user space to communicate |
| 13 | with a WatchDog Timer. If you want to know this then please read the following |
| 14 | file: Documentation/watchdog/watchdog-api.rst . |
| 15 | |
| 16 | So what does this document describe? It describes the API that can be used by |
| 17 | WatchDog Timer Drivers that want to use the WatchDog Timer Driver Core |
| 18 | Framework. This framework provides all interfacing towards user space so that |
| 19 | the same code does not have to be reproduced each time. This also means that |
| 20 | a watchdog timer driver then only needs to provide the different routines |
| 21 | (operations) that control the watchdog timer (WDT). |
| 22 | |
| 23 | The API |
| 24 | ------- |
| 25 | Each watchdog timer driver that wants to use the WatchDog Timer Driver Core |
| 26 | must #include <linux/watchdog.h> (you would have to do this anyway when |
| 27 | writing a watchdog device driver). This include file contains following |
| 28 | register/unregister routines:: |
| 29 | |
| 30 | extern int watchdog_register_device(struct watchdog_device *); |
| 31 | extern void watchdog_unregister_device(struct watchdog_device *); |
| 32 | |
| 33 | The watchdog_register_device routine registers a watchdog timer device. |
| 34 | The parameter of this routine is a pointer to a watchdog_device structure. |
| 35 | This routine returns zero on success and a negative errno code for failure. |
| 36 | |
| 37 | The watchdog_unregister_device routine deregisters a registered watchdog timer |
| 38 | device. The parameter of this routine is the pointer to the registered |
| 39 | watchdog_device structure. |
| 40 | |
| 41 | The watchdog subsystem includes an registration deferral mechanism, |
| 42 | which allows you to register an watchdog as early as you wish during |
| 43 | the boot process. |
| 44 | |
| 45 | The watchdog device structure looks like this:: |
| 46 | |
| 47 | struct watchdog_device { |
| 48 | int id; |
| 49 | struct device *parent; |
| 50 | const struct attribute_group **groups; |
| 51 | const struct watchdog_info *info; |
| 52 | const struct watchdog_ops *ops; |
| 53 | const struct watchdog_governor *gov; |
| 54 | unsigned int bootstatus; |
| 55 | unsigned int timeout; |
| 56 | unsigned int pretimeout; |
| 57 | unsigned int min_timeout; |
| 58 | unsigned int max_timeout; |
| 59 | unsigned int min_hw_heartbeat_ms; |
| 60 | unsigned int max_hw_heartbeat_ms; |
| 61 | struct notifier_block reboot_nb; |
| 62 | struct notifier_block restart_nb; |
| 63 | void *driver_data; |
| 64 | struct watchdog_core_data *wd_data; |
| 65 | unsigned long status; |
| 66 | struct list_head deferred; |
| 67 | }; |
| 68 | |
| 69 | It contains following fields: |
| 70 | |
| 71 | * id: set by watchdog_register_device, id 0 is special. It has both a |
| 72 | /dev/watchdog0 cdev (dynamic major, minor 0) as well as the old |
| 73 | /dev/watchdog miscdev. The id is set automatically when calling |
| 74 | watchdog_register_device. |
| 75 | * parent: set this to the parent device (or NULL) before calling |
| 76 | watchdog_register_device. |
| 77 | * groups: List of sysfs attribute groups to create when creating the watchdog |
| 78 | device. |
| 79 | * info: a pointer to a watchdog_info structure. This structure gives some |
| 80 | additional information about the watchdog timer itself. (Like it's unique name) |
| 81 | * ops: a pointer to the list of watchdog operations that the watchdog supports. |
| 82 | * gov: a pointer to the assigned watchdog device pretimeout governor or NULL. |
| 83 | * timeout: the watchdog timer's timeout value (in seconds). |
| 84 | This is the time after which the system will reboot if user space does |
| 85 | not send a heartbeat request if WDOG_ACTIVE is set. |
| 86 | * pretimeout: the watchdog timer's pretimeout value (in seconds). |
| 87 | * min_timeout: the watchdog timer's minimum timeout value (in seconds). |
| 88 | If set, the minimum configurable value for 'timeout'. |
| 89 | * max_timeout: the watchdog timer's maximum timeout value (in seconds), |
| 90 | as seen from userspace. If set, the maximum configurable value for |
| 91 | 'timeout'. Not used if max_hw_heartbeat_ms is non-zero. |
| 92 | * min_hw_heartbeat_ms: Hardware limit for minimum time between heartbeats, |
| 93 | in milli-seconds. This value is normally 0; it should only be provided |
| 94 | if the hardware can not tolerate lower intervals between heartbeats. |
| 95 | * max_hw_heartbeat_ms: Maximum hardware heartbeat, in milli-seconds. |
| 96 | If set, the infrastructure will send heartbeats to the watchdog driver |
| 97 | if 'timeout' is larger than max_hw_heartbeat_ms, unless WDOG_ACTIVE |
| 98 | is set and userspace failed to send a heartbeat for at least 'timeout' |
| 99 | seconds. max_hw_heartbeat_ms must be set if a driver does not implement |
| 100 | the stop function. |
| 101 | * reboot_nb: notifier block that is registered for reboot notifications, for |
| 102 | internal use only. If the driver calls watchdog_stop_on_reboot, watchdog core |
| 103 | will stop the watchdog on such notifications. |
| 104 | * restart_nb: notifier block that is registered for machine restart, for |
| 105 | internal use only. If a watchdog is capable of restarting the machine, it |
| 106 | should define ops->restart. Priority can be changed through |
| 107 | watchdog_set_restart_priority. |
| 108 | * bootstatus: status of the device after booting (reported with watchdog |
| 109 | WDIOF_* status bits). |
| 110 | * driver_data: a pointer to the drivers private data of a watchdog device. |
| 111 | This data should only be accessed via the watchdog_set_drvdata and |
| 112 | watchdog_get_drvdata routines. |
| 113 | * wd_data: a pointer to watchdog core internal data. |
| 114 | * status: this field contains a number of status bits that give extra |
| 115 | information about the status of the device (Like: is the watchdog timer |
| 116 | running/active, or is the nowayout bit set). |
| 117 | * deferred: entry in wtd_deferred_reg_list which is used to |
| 118 | register early initialized watchdogs. |
| 119 | |
| 120 | The list of watchdog operations is defined as:: |
| 121 | |
| 122 | struct watchdog_ops { |
| 123 | struct module *owner; |
| 124 | /* mandatory operations */ |
| 125 | int (*start)(struct watchdog_device *); |
| 126 | int (*stop)(struct watchdog_device *); |
| 127 | /* optional operations */ |
| 128 | int (*ping)(struct watchdog_device *); |
| 129 | unsigned int (*status)(struct watchdog_device *); |
| 130 | int (*set_timeout)(struct watchdog_device *, unsigned int); |
| 131 | int (*set_pretimeout)(struct watchdog_device *, unsigned int); |
| 132 | unsigned int (*get_timeleft)(struct watchdog_device *); |
| 133 | int (*restart)(struct watchdog_device *); |
| 134 | long (*ioctl)(struct watchdog_device *, unsigned int, unsigned long); |
| 135 | }; |
| 136 | |
| 137 | It is important that you first define the module owner of the watchdog timer |
| 138 | driver's operations. This module owner will be used to lock the module when |
| 139 | the watchdog is active. (This to avoid a system crash when you unload the |
| 140 | module and /dev/watchdog is still open). |
| 141 | |
| 142 | Some operations are mandatory and some are optional. The mandatory operations |
| 143 | are: |
| 144 | |
| 145 | * start: this is a pointer to the routine that starts the watchdog timer |
| 146 | device. |
| 147 | The routine needs a pointer to the watchdog timer device structure as a |
| 148 | parameter. It returns zero on success or a negative errno code for failure. |
| 149 | |
| 150 | Not all watchdog timer hardware supports the same functionality. That's why |
| 151 | all other routines/operations are optional. They only need to be provided if |
| 152 | they are supported. These optional routines/operations are: |
| 153 | |
| 154 | * stop: with this routine the watchdog timer device is being stopped. |
| 155 | |
| 156 | The routine needs a pointer to the watchdog timer device structure as a |
| 157 | parameter. It returns zero on success or a negative errno code for failure. |
| 158 | Some watchdog timer hardware can only be started and not be stopped. A |
| 159 | driver supporting such hardware does not have to implement the stop routine. |
| 160 | |
| 161 | If a driver has no stop function, the watchdog core will set WDOG_HW_RUNNING |
| 162 | and start calling the driver's keepalive pings function after the watchdog |
| 163 | device is closed. |
| 164 | |
| 165 | If a watchdog driver does not implement the stop function, it must set |
| 166 | max_hw_heartbeat_ms. |
| 167 | * ping: this is the routine that sends a keepalive ping to the watchdog timer |
| 168 | hardware. |
| 169 | |
| 170 | The routine needs a pointer to the watchdog timer device structure as a |
| 171 | parameter. It returns zero on success or a negative errno code for failure. |
| 172 | |
| 173 | Most hardware that does not support this as a separate function uses the |
| 174 | start function to restart the watchdog timer hardware. And that's also what |
| 175 | the watchdog timer driver core does: to send a keepalive ping to the watchdog |
| 176 | timer hardware it will either use the ping operation (when available) or the |
| 177 | start operation (when the ping operation is not available). |
| 178 | |
| 179 | (Note: the WDIOC_KEEPALIVE ioctl call will only be active when the |
| 180 | WDIOF_KEEPALIVEPING bit has been set in the option field on the watchdog's |
| 181 | info structure). |
| 182 | * status: this routine checks the status of the watchdog timer device. The |
| 183 | status of the device is reported with watchdog WDIOF_* status flags/bits. |
| 184 | |
| 185 | WDIOF_MAGICCLOSE and WDIOF_KEEPALIVEPING are reported by the watchdog core; |
| 186 | it is not necessary to report those bits from the driver. Also, if no status |
| 187 | function is provided by the driver, the watchdog core reports the status bits |
| 188 | provided in the bootstatus variable of struct watchdog_device. |
| 189 | |
| 190 | * set_timeout: this routine checks and changes the timeout of the watchdog |
| 191 | timer device. It returns 0 on success, -EINVAL for "parameter out of range" |
| 192 | and -EIO for "could not write value to the watchdog". On success this |
| 193 | routine should set the timeout value of the watchdog_device to the |
| 194 | achieved timeout value (which may be different from the requested one |
| 195 | because the watchdog does not necessarily have a 1 second resolution). |
| 196 | |
| 197 | Drivers implementing max_hw_heartbeat_ms set the hardware watchdog heartbeat |
| 198 | to the minimum of timeout and max_hw_heartbeat_ms. Those drivers set the |
| 199 | timeout value of the watchdog_device either to the requested timeout value |
| 200 | (if it is larger than max_hw_heartbeat_ms), or to the achieved timeout value. |
| 201 | (Note: the WDIOF_SETTIMEOUT needs to be set in the options field of the |
| 202 | watchdog's info structure). |
| 203 | |
| 204 | If the watchdog driver does not have to perform any action but setting the |
| 205 | watchdog_device.timeout, this callback can be omitted. |
| 206 | |
| 207 | If set_timeout is not provided but, WDIOF_SETTIMEOUT is set, the watchdog |
| 208 | infrastructure updates the timeout value of the watchdog_device internally |
| 209 | to the requested value. |
| 210 | |
| 211 | If the pretimeout feature is used (WDIOF_PRETIMEOUT), then set_timeout must |
| 212 | also take care of checking if pretimeout is still valid and set up the timer |
| 213 | accordingly. This can't be done in the core without races, so it is the |
| 214 | duty of the driver. |
| 215 | * set_pretimeout: this routine checks and changes the pretimeout value of |
| 216 | the watchdog. It is optional because not all watchdogs support pretimeout |
| 217 | notification. The timeout value is not an absolute time, but the number of |
| 218 | seconds before the actual timeout would happen. It returns 0 on success, |
| 219 | -EINVAL for "parameter out of range" and -EIO for "could not write value to |
| 220 | the watchdog". A value of 0 disables pretimeout notification. |
| 221 | |
| 222 | (Note: the WDIOF_PRETIMEOUT needs to be set in the options field of the |
| 223 | watchdog's info structure). |
| 224 | |
| 225 | If the watchdog driver does not have to perform any action but setting the |
| 226 | watchdog_device.pretimeout, this callback can be omitted. That means if |
| 227 | set_pretimeout is not provided but WDIOF_PRETIMEOUT is set, the watchdog |
| 228 | infrastructure updates the pretimeout value of the watchdog_device internally |
| 229 | to the requested value. |
| 230 | |
| 231 | * get_timeleft: this routines returns the time that's left before a reset. |
| 232 | * restart: this routine restarts the machine. It returns 0 on success or a |
| 233 | negative errno code for failure. |
| 234 | * ioctl: if this routine is present then it will be called first before we do |
| 235 | our own internal ioctl call handling. This routine should return -ENOIOCTLCMD |
| 236 | if a command is not supported. The parameters that are passed to the ioctl |
| 237 | call are: watchdog_device, cmd and arg. |
| 238 | |
| 239 | The status bits should (preferably) be set with the set_bit and clear_bit alike |
| 240 | bit-operations. The status bits that are defined are: |
| 241 | |
| 242 | * WDOG_ACTIVE: this status bit indicates whether or not a watchdog timer device |
| 243 | is active or not from user perspective. User space is expected to send |
| 244 | heartbeat requests to the driver while this flag is set. |
| 245 | * WDOG_NO_WAY_OUT: this bit stores the nowayout setting for the watchdog. |
| 246 | If this bit is set then the watchdog timer will not be able to stop. |
| 247 | * WDOG_HW_RUNNING: Set by the watchdog driver if the hardware watchdog is |
| 248 | running. The bit must be set if the watchdog timer hardware can not be |
| 249 | stopped. The bit may also be set if the watchdog timer is running after |
| 250 | booting, before the watchdog device is opened. If set, the watchdog |
| 251 | infrastructure will send keepalives to the watchdog hardware while |
| 252 | WDOG_ACTIVE is not set. |
| 253 | Note: when you register the watchdog timer device with this bit set, |
| 254 | then opening /dev/watchdog will skip the start operation but send a keepalive |
| 255 | request instead. |
| 256 | |
| 257 | To set the WDOG_NO_WAY_OUT status bit (before registering your watchdog |
| 258 | timer device) you can either: |
| 259 | |
| 260 | * set it statically in your watchdog_device struct with |
| 261 | |
| 262 | .status = WATCHDOG_NOWAYOUT_INIT_STATUS, |
| 263 | |
| 264 | (this will set the value the same as CONFIG_WATCHDOG_NOWAYOUT) or |
| 265 | * use the following helper function:: |
| 266 | |
| 267 | static inline void watchdog_set_nowayout(struct watchdog_device *wdd, |
| 268 | int nowayout) |
| 269 | |
| 270 | Note: |
| 271 | The WatchDog Timer Driver Core supports the magic close feature and |
| 272 | the nowayout feature. To use the magic close feature you must set the |
| 273 | WDIOF_MAGICCLOSE bit in the options field of the watchdog's info structure. |
| 274 | |
| 275 | The nowayout feature will overrule the magic close feature. |
| 276 | |
| 277 | To get or set driver specific data the following two helper functions should be |
| 278 | used:: |
| 279 | |
| 280 | static inline void watchdog_set_drvdata(struct watchdog_device *wdd, |
| 281 | void *data) |
| 282 | static inline void *watchdog_get_drvdata(struct watchdog_device *wdd) |
| 283 | |
| 284 | The watchdog_set_drvdata function allows you to add driver specific data. The |
| 285 | arguments of this function are the watchdog device where you want to add the |
| 286 | driver specific data to and a pointer to the data itself. |
| 287 | |
| 288 | The watchdog_get_drvdata function allows you to retrieve driver specific data. |
| 289 | The argument of this function is the watchdog device where you want to retrieve |
| 290 | data from. The function returns the pointer to the driver specific data. |
| 291 | |
| 292 | To initialize the timeout field, the following function can be used:: |
| 293 | |
| 294 | extern int watchdog_init_timeout(struct watchdog_device *wdd, |
| 295 | unsigned int timeout_parm, |
| 296 | struct device *dev); |
| 297 | |
| 298 | The watchdog_init_timeout function allows you to initialize the timeout field |
| 299 | using the module timeout parameter or by retrieving the timeout-sec property from |
| 300 | the device tree (if the module timeout parameter is invalid). Best practice is |
| 301 | to set the default timeout value as timeout value in the watchdog_device and |
| 302 | then use this function to set the user "preferred" timeout value. |
| 303 | This routine returns zero on success and a negative errno code for failure. |
| 304 | |
| 305 | To disable the watchdog on reboot, the user must call the following helper:: |
| 306 | |
| 307 | static inline void watchdog_stop_on_reboot(struct watchdog_device *wdd); |
| 308 | |
| 309 | To disable the watchdog when unregistering the watchdog, the user must call |
| 310 | the following helper. Note that this will only stop the watchdog if the |
| 311 | nowayout flag is not set. |
| 312 | |
| 313 | :: |
| 314 | |
| 315 | static inline void watchdog_stop_on_unregister(struct watchdog_device *wdd); |
| 316 | |
| 317 | To change the priority of the restart handler the following helper should be |
| 318 | used:: |
| 319 | |
| 320 | void watchdog_set_restart_priority(struct watchdog_device *wdd, int priority); |
| 321 | |
| 322 | User should follow the following guidelines for setting the priority: |
| 323 | |
| 324 | * 0: should be called in last resort, has limited restart capabilities |
| 325 | * 128: default restart handler, use if no other handler is expected to be |
| 326 | available, and/or if restart is sufficient to restart the entire system |
| 327 | * 255: highest priority, will preempt all other restart handlers |
| 328 | |
| 329 | To raise a pretimeout notification, the following function should be used:: |
| 330 | |
| 331 | void watchdog_notify_pretimeout(struct watchdog_device *wdd) |
| 332 | |
| 333 | The function can be called in the interrupt context. If watchdog pretimeout |
| 334 | governor framework (kbuild CONFIG_WATCHDOG_PRETIMEOUT_GOV symbol) is enabled, |
| 335 | an action is taken by a preconfigured pretimeout governor preassigned to |
| 336 | the watchdog device. If watchdog pretimeout governor framework is not |
| 337 | enabled, watchdog_notify_pretimeout() prints a notification message to |
| 338 | the kernel log buffer. |