Fix for cooldown time in health checker plugin #444
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Use InactiveExitTimestamp instead of ActiveEnterTimestamp for cooldown period in health check monitor.
With ActiveEnterTimestamp we encountered a race condition:
The docker service was configured with RestartSec=10 which coincides with invocation interval of health checker in NPD. The systemd restart of the service and the health check got in sync and the service was killed every 10 sec. The cooldown period did not work since the service never attained Active state even though the systemd start was attempted.
Using InactiveExitTimestamp will help identifying the restart attempt from systemd even before the service attains Active state.