last_datapoint = invocation time for collectors #309
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We have a collector that occasionally takes ages to run (40-50 seconds), and sometimes a lot longer (600+ seconds).
Killing after 600 seconds is fine, but we found that tcollector.py kept killing this collector in check_children() for newly spawned collectors, as the last_datapoint was too old on every invocation of check_children. Also, because it was being killed off all the time, the last_datapoint obviously never got updated after it hit the 600s limit the first time.
The fix around this (well, a fix that works for us, at least) is to set the last_datapoint of a collector to the time it was spawned, rather than when the object was first created (which could be ages ago).