HAL Component — RTMON
rtmon — monitor RT faults and generate estop signal based on a parameterized function
This component is an example how realtime faults (e.g. timing violations) can be dealt with. The realtime API (RTAPI) only collects relevant events but does not act upon them; every RT fault causes a call to the RTAPI exception handler, which is a function in RTAPI printing a message to the system log by default.
This exception handler may be overridden within a RT HAL component by means of the rtapi_set_exception(new_handler) function. The arguments of the handler provide detail about the cause of the fault.
The rtmon component intercepts this exception handler, and applies policy - for instance, to generate an estop signal based on a set of parameters.
The scheme implemented here was inspired by Andy Pugh’s smart serial code, which works as follows:
There is a programable increment, decrement and limit. If there is an error then the error count increments, if no error it decrements, when you hit the limit, you trigger a reaction.
No decrement = no fault healing. No increment = no fault reaction.
Assuming that the rtmon thread function is added to the servo thread (1mS) , then an increment of 100 and decrement of 1 and threshold of 500 would trigger a fault reaction after 5 seconds of more than ten errors a second.
Another scenario could be where a certain amount of errors during startup is tolerated: If you know that there will be 100 faults at startup, but those are OK, but any later ones are a problem, then increment of 1, decrement of zero and threshold of 101 (simplistically) gives that behaviour.
rtmon.N (requires a floating-point thread)
estop bit out - activated if more than <threshold> exceptions occur.
estop-reset bit in - a positive edge on this pin clears the estop and fault_count pins
rt-faults u32 out - total number of RT scheduling violations observed
fault-count u32 out - current level for RT scheduling violations; reset on estop_reset positive edge
stats-updates u32 out - total number of rtapi_task_update_stats() calls so far
fault-inc u32 rw (default: 1) - increment fault_count by fault_inc for each RT fault detected
fault-dec u32 rw (default: 0) - decrement fault_count in each thread iteration
fault-limit u32 rw (default: 1) - if fault_count exceeds fault_limit, cause an estop
autoclear-estop bit rw (default: 0) - clear estop condition once fault_count reaches zero
stats-divisor u32 rw (default: 1000) - rtapi_task_update_stats() update frequency