pkg/osquery/runtime improvements, largely around improving test flakiness #1798
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I've noticed that the runtime tests have been particularly flaky on Windows lately and it's been holding up CI significantly because we often have to wait for the 10-min test timeout to hit before test failure. See e.g. https://github.com/kolide/launcher/actions/runs/10094889355/job/27918355827.
Issues found
runner.Shutdown
was getting stuck; tests would panic after 10 minutes due to timeout. (This really holds up CI, plus the panic means we don't really get any useful information about what part of the test got stuck.)waitHealthy
reported that the osquery instance was healthy before we'd actually finished launching the instance. I believe that this was what causedrunner.Shutdown
to get stuck.Fixes in this PR
Test improvements
runner.Shutdown
, so that we can quickly fail and get useful troubleshooting information whenShutdown
is stuck -- instead of waiting 10 minutes to hit test timeout + panic, which gives us no useful information about the issuerunner.Healthy()
returns no error, also confirm that the launch function has finished all critical work, and add a little bit of a sleep buffer before proceedingrunner.Healthy
when that fails; include runner logs in failure messagesCode improvements
kolide_grpc