Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add global-ratelimiter aggregator-side metrics #6171

Merged
merged 3 commits into from
Jul 18, 2024

Conversation

Groxx
Copy link
Member

@Groxx Groxx commented Jul 17, 2024

It would've been useful to have some of these while checking the initial rollout and troubleshooting, and the rest seem possibly-also-useful and easy to collect.

I've also adjusted the histogram buckets shared by other global ratelimiter stuff - nothing is planned that will be actually sensitive to the values, so this should be harmless. It's more for general "wait why do we have 100x more/fewer than we expected"-style discoveries.

Copy link

codecov bot commented Jul 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.76%. Comparing base (8664922) to head (39c4a12).
Report is 6 commits behind head on master.

Additional details and impacted files
Files Coverage Δ
common/quotas/global/algorithm/requestweighted.go 95.55% <100.00%> (+1.07%) ⬆️
common/quotas/global/collection/collection.go 84.02% <100.00%> (+1.33%) ⬆️
common/quotas/global/rpc/client.go 76.19% <100.00%> (ø)

... and 11 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8664922...39c4a12. Read the comment docs.

Copy link
Member

@3vilhamster 3vilhamster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks good. I think part of metrics verification can be done in unit tests, but that is not widely used in the repo.

@@ -1729,7 +1731,8 @@ var ScopeDefs = map[ServiceIdx]map[int]scopeDefinition{
HashringScope: {operation: "Hashring"},

// currently used by both frontend and history, but may grow to other limiting-host-services.
FrontendGlobalRatelimiter: {operation: "GlobalRatelimiter"},
GlobalRatelimiter: {operation: "GlobalRatelimiter"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using lower-case. m3 lowercase everything, and usage of CamelCase makes it harder to find metrics.

Copy link
Member Author

@Groxx Groxx Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly agree, lower-case is nicer for lots of things, but all of our "operation" tags are title-case at the moment. Not sure there's much benefit to breaking that long-held pattern.

is it just for code-grepping difficulty, or does it make metric-querying harder somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is pretty easy to change if we do want to change things later, nothing we have now depends on this for monitoring/log search/etc)

common/metrics/defs.go Outdated Show resolved Hide resolved
service/history/handler/handler_test.go Show resolved Hide resolved
@@ -538,3 +578,18 @@ func weighted[T numeric](newer, older T, weight float64) T {
type numeric interface {
constraints.Integer | constraints.Float
}

// non-sync version of sync.Once, for easier unlocking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make it explicit in the comments there "Do NOT use this for concurrent cases"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is part of why I didn't make it a public type, yea. not intended for reuse.

@Groxx Groxx merged commit 4a51fb4 into cadence-workflow:master Jul 18, 2024
20 checks passed
@Groxx Groxx deleted the ratelimiter-agg-metrics branch July 18, 2024 19:37
jakobht pushed a commit to jakobht/cadence that referenced this pull request Aug 12, 2024
It would've been useful to have some of these while checking the initial rollout and troubleshooting, and the rest seem possibly-also-useful and easy to collect.

I've also adjusted the histogram buckets shared by other global ratelimiter stuff - nothing is planned that will be actually sensitive to the values, so this should be harmless. It's more for general "wait why do we have 100x more/fewer than we expected"-style discoveries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants