-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quota: throughput racy enforcement bug fixes #467
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…enforcement The previous implementation has a racy update of throughputUsagePerNamespace.resetAt and dataPoints that might potentially lead quota over-enforcement. This commit addess it by wrapping them in one struct with atomic updates. The new tradeoff is under-enforcement during counter reset, a better tradeoff imo. more details could be found in the commit.
bom-d-van
force-pushed
the
quota/throughput-edge-bug
branch
from
May 18, 2022 11:09
a87da91
to
20d341b
Compare
When throughput quota usage might be exhausted in a child rule, in the previous implementation, the throughput counter would be updated immediately after the check, this might lead to over-accounting for parent quota configs. The comment changed the implementation to only update the counter if the traffic is confirmed to be within quota consumption. At the same time, go-carbon also makes sure to report throttled data points for soft throughput quota enforcement (i.e. with "none" dropping policy).
the current quota config file is last-match-wins, and with heavy concurrent quota enforcement, go-carbon should only update quota info per node once, otherwise it risks of having confusing and incorrect quota enforcement. for example, suppose we have the following quota configs: [sys.*] throughput = 1024 [sys.app] throughput = 4096 if sys.app is updated twice using top down order in the quota config file, there is a window that sys.app would have a quota with throughput of 1024, and if the namespace happen to be receiving more than 1024 data points during that window, it would trigger incorrect throttling. TODO: with updateChecker, it's also straightforward now to support first-match-wins in quota config file, but we would have to introduce a new flat to ask for it for backward compatibility.
bom-d-van
force-pushed
the
quota/throughput-edge-bug
branch
from
May 18, 2022 12:01
20d341b
to
99f8eb7
Compare
note: need to add a few tests for this PR in a separate one, it's only tested on our production and it seems to be working. |
note: also included some minor changes for deep source warning and debugging code cleanup. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Three changes are made, but the most important one is quota: fix a race condition bug in trieIndex.applyQuotas. The other two commits are by-products of debugging the racy applyQuotas.
quota: fix a race condition bug in trieIndex.applyQuotas
quota: delay throughput counter update in trieindex.throttle
quota: refactor throughputUsagePerNamespace usage/quota tracking and enforcement