isFailedNode - what it does, what it doesn't #1447

AlCalzone · 2021-01-19T11:49:17Z

AlCalzone
Jan 19, 2021
Maintainer

/cc @hanskroner - like we discussed

Jan 20, 2021

The Z-Wave protocol maintains a "failed Node ID list." The list is quite small, typically only 5 entries - the oldest entry gets booted if a new one comes in when the list is full. Whether or not it is used by the protocol in its internal workings is not really I question I can answer.

Let's first discuss what a host application typically uses it for:

Controller devices almost always come across this list for the first time when looking at "Remove Failed Node" or "Replace Failed Node" functionality. As a safeguard, those function calls will not allow removing/replacing a Node ID that's not on the failed list.

When excluding a node from the networking using the "Remove Node from Network" c…

View full answer

hanskroner · 2021-01-20T13:22:04Z

hanskroner
Jan 20, 2021

The Z-Wave protocol maintains a "failed Node ID list." The list is quite small, typically only 5 entries - the oldest entry gets booted if a new one comes in when the list is full. Whether or not it is used by the protocol in its internal workings is not really I question I can answer.

Let's first discuss what a host application typically uses it for:

Controller devices almost always come across this list for the first time when looking at "Remove Failed Node" or "Replace Failed Node" functionality. As a safeguard, those function calls will not allow removing/replacing a Node ID that's not on the failed list.

When excluding a node from the networking using the "Remove Node from Network" call, the process requires the node to be excluded to be placed in "learn" mode. The controller resets the node's Home and Node IDs and then the node resets itself to its factory defaults. None of this happens with "Remove Failed" or "Replace Failed", so it's important that the node really is failed and won't return to the network. If the device does return to the network, there'll now be two identical Node IDs, which will confuse the bejeezus out of the MAC layer - hence the safeguard.

The Z-Wave protocol will move a Node ID into the failed list whenever it fails to ACK a message - an application cannot directly write to this list. This is why the documentation requires "pinging" the Node ID with a NOP before attempting "Remove Failed" or "Replace Failed" - if the node fails to ACK this request, it'll be moved to the failed list and the calls to "Remove Failed" or "Replace Failed" will be allowed.

So, this list's content is basically the last 5 Node IDs that missed ACKing a transmission from the Controller. Being in this list doesn't mean a Node ID is "failed" the way an application might define "failed". A device on the list might've been on a flaky link and missed an ACK, but might reply if tried again. Or a wall plug might've been unplugged a when the controller tried to communicate with it, but is now plugged back in. Or the controller might've tried to message a Non-Listening (NL) node, which then gets put on the failed list for not replying, but is actually sleeping according to its configured Wake-Up Interval and is not yet due for a check-in. This list isn't something that an application wanting to monitor the quality of a mesh network's wireless links, or the "alive/dead" status of a node should be looking at.

2 replies

AlCalzone Jan 20, 2021
Maintainer Author

This is why the documentation requires "pinging" the Node ID with a NOP before attempting "Remove Failed" or "Replace Failed"

I was under the impression that a single ping wouldn't be enough. If it is, I can scrap #1444 and just add a ping in both controller methods.

hanskroner Jan 21, 2021

I don't recall if the documentation explicitly mentions it, but a single transmission that misses an ACK (after however many retransmission attempts the protocol does) is enough for a Node ID to be moved to the fail list. Note that, of course, the transmission must be sent with the transmit option requesting an ACK set. As soon as the Node ID ACKs a transmission, it gets removed from the failed list.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

isFailedNode - what it does, what it doesn't #1447

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

isFailedNode - what it does, what it doesn't #1447

AlCalzone Jan 19, 2021 Maintainer

Replies: 1 comment · 2 replies

hanskroner Jan 20, 2021

AlCalzone Jan 20, 2021 Maintainer Author

hanskroner Jan 21, 2021

AlCalzone
Jan 19, 2021
Maintainer

Replies: 1 comment 2 replies

hanskroner
Jan 20, 2021

AlCalzone Jan 20, 2021
Maintainer Author