Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eval broker: use write lock when reaping cancelable evals #16112

Merged
merged 1 commit into from
Feb 10, 2023

Conversation

tgross
Copy link
Member

@tgross tgross commented Feb 9, 2023

The eval broker's Cancelable method used by the cancelable eval reaper mutates
the slice of cancelable evals by removing a batch at a time from the slice. But
this method unsafely uses a read lock despite this mutation. Under normal
workloads this is likely to be safe but when the eval broker is under the heavy
load this feature is intended to fix, we're likely to have a race
condition. Switch this to a write lock, like the other locks that mutate the
eval broker state.

This changeset also adjusts the timeout to allow poorly-sized Actions runners
more time to schedule the appropriate goroutines. The test has also been updated
to use shoenig/test/wait so we can have sensible reporting of the results
rather than just a timeout error when things go wrong.

@tgross tgross added theme/testing Test related issues theme/flaky-tests labels Feb 9, 2023
@tgross tgross added this to the 1.5.0 milestone Feb 9, 2023
@tgross tgross added the backport/1.4.x backport to 1.4.x release line label Feb 9, 2023
@tgross tgross changed the title eval broker: extend timeout on integration test eval broker test: enable canceled eval reaper Feb 9, 2023
@tgross tgross self-assigned this Feb 9, 2023
The eval broker's `Cancelable` method used by the cancelable eval reaper mutates
the slice of cancelable evals by removing a batch at a time from the slice. But
this method unsafely uses a read lock despite this mutation. Under normal
workloads this is likely to be safe but when the eval broker is under the heavy
load this feature is intended to fix, we're likely to have a race
condition. Switch this to a write lock, like the other locks that mutate the
eval broker state.

This changeset also adjusts the timeout to allow poorly-sized Actions runners
more time to schedule the appropriate goroutines. The test has also been updated
to use `shoenig/test/wait` so we can have sensible reporting of the results
rather than just a timeout error when things go wrong.
@tgross tgross changed the title eval broker test: enable canceled eval reaper eval broker: use write lock when reaping cancelable evals Feb 10, 2023
@tgross tgross marked this pull request as ready for review February 10, 2023 15:09
Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.4.x backport to 1.4.x release line theme/flaky-tests theme/testing Test related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants