You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #652 the problem was raised that bluechi-controller detects a disconnect of the bluechi-agent quite late if the cable is unplugged, for example, and a command is issued to that agent before the connection timeout is hit. This leads to a "zombie" agent in the bluechi-controller - it still lists the agent as online and refuses reconnects of that agent (due to the name still being used).
This can be mitigated via the introduced TCP KeepAlive options (#674), but it still takes quite a while to detect it (depending on various complex tcp options, e.g. tcp retransmissions).
The bluechi-agent on the other side can detect a disconnect rather soon due to the Heartbeat feature. Such a periodic check of the connection status on an application layer could be used in the bluechi-controller as well. Based on the last seen timestamp, it could actively disconnect nodes.
Note: This only makes sense for rather reliable networks, I think, and should be deactivated by default (so no overhead).
Please describe the solution you'd like
The bluechi-controller uses the same event-based mechanism that is used in the bluechi-agent for the heartbeat to check in a configurable interval for each node that is online the last seen timestamp is not older than a configurable threshold. If it is older, it actively disconnects the node.
New configuration options for bluechi-controller
HeartbeatInterval: The interval for checking the last seen timestamps of nodes in milliseconds, a value of 0 disables it.
NodeHeartbeatThreshold: If now - last_seen_timestamp > NodeHeartbeatThreshold then actively disconnect the node
Implement verify and disconnect logic
Implement integration tests
Extend documentation (man pages, examples, etc.)
The text was updated successfully, but these errors were encountered:
Please describe what you would like to see
In #652 the problem was raised that
bluechi-controller
detects a disconnect of thebluechi-agent
quite late if the cable is unplugged, for example, and a command is issued to that agent before the connection timeout is hit. This leads to a "zombie" agent in thebluechi-controller
- it still lists the agent as online and refuses reconnects of that agent (due to the name still being used).This can be mitigated via the introduced TCP KeepAlive options (#674), but it still takes quite a while to detect it (depending on various complex tcp options, e.g. tcp retransmissions).
The
bluechi-agent
on the other side can detect a disconnect rather soon due to the Heartbeat feature. Such a periodic check of the connection status on an application layer could be used in thebluechi-controller
as well. Based on the last seen timestamp, it could actively disconnect nodes.Note: This only makes sense for rather reliable networks, I think, and should be deactivated by default (so no overhead).
Please describe the solution you'd like
The
bluechi-controller
uses the same event-based mechanism that is used in thebluechi-agent
for the heartbeat to check in a configurable interval for each node that is online the last seen timestamp is not older than a configurable threshold. If it is older, it actively disconnects the node.bluechi-controller
HeartbeatInterval
: The interval for checking the last seen timestamps of nodes in milliseconds, a value of 0 disables it.NodeHeartbeatThreshold
: Ifnow - last_seen_timestamp > NodeHeartbeatThreshold
then actively disconnect the nodeThe text was updated successfully, but these errors were encountered: