-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduces a quota subsystem #420
Conversation
851011f
to
3775b56
Compare
a95134c
to
30cd0cd
Compare
0607c62
to
4a173f5
Compare
7ae9dc1
to
3a7a882
Compare
57522a3
to
3d77327
Compare
Hi guys, these changes are finally ready for review (took a bit of force-pushes to get deepsource happy). Can you take a look when you have time and let me know if we can get it merged? The changes are deployed on prod for a few weeks and not more new issues are identified for now. As always, it's intended to be backward-compatible and should have no breaking changes. |
How many stat metrics should I expect with this in Quota config?
or with this
As many as quantity of metrics inside, or this is not going to resolve the * to the actual names? |
Hi @azhiltsov , it's a like writing a graphite query. For example, supposed that there are 3 top level namespaces in the go-carbon instance: [*]
metrics = 1,000,000 the config above is the same as the config bellow for the example: [user]
metrics = 1,000,000
[sys]
metrics = 1,000,000
[net]
metrics = 1,000,000 Same for Hope this explains it. |
The quota subsystem is made to improve control, reliability and visibility of go-carbon. It is not a standard graphite component, but it is backward compatible and could be turned on optionally. Caveat: the current implementation only supports concurrent/realtime trie index. The quota subsystem allows user to control how many resources can be consumed on a patter-matching based basis. Implemented controls include: data points (based retention policy), disk size (logical and physical), throughput, metric count, and namespaces (i.e. immediate sub-directory count). More details could be found in doc/quotas.md in the PR. An example configuration: ```ini [*] metrics = 1,000,000 logical_size = 250,000,000,000 physical_size = 50,000,000,000 data_points = max throughput = max [sys.app.*] metrics = 3,000,000 logical_size = 1,500,000,000,000 physical_size = 100,000,000,000 data_points = 130,000,000,000 [/] namespaces = 20 metrics = 10,000,000 logical_size = 2,500,000,000,000 physical_size = 2,500,000,000,000 data_points = 200,000,000,000 dropping_policy = new ``` Throttling control is implemented in `carbonserver`, while quota config is implemented in persister (mainly for convenience).
> simplifyPathError retruns an error without path in the error message. This > simpplies the log a bit as the path is usually printed separately and the > new error message is easier to filter in elastic search and other tools.
WHY: with concurrent and realtime index, disk scan should be set at am interval like 2 hours or longer. couting the files in trie index gives us more timely visibilitty on how many metrics are known now.
The original implementation would keep resending the quota and usage metrics after the initial flush, which is not helpful. Using a channel to avoid duplicate flushes.
When it comes to point search, map lookup is very fast, much faster than traversing the trie tree. What's more, the current trie implementation adopts a space efficient strategy and children are not sorted. And this means much higher cpu usage. We are seeing up to 700% cpu usage growth in a cluster receving 1 millions+ data points. By adopting a map-based implementation, we are able to cut down up to 600% of cpu usage in that cluster, which means it is much more efficient to collect and control throughput usage.
Hi everyone, the subsystem has been on our production for sometime and it appears to be stable. If no objection, I will proceed to merge the changes today. Feel free to let me know that if you have concerns. |
Yes, looks good, @bom-d-van , please proceed. |
Thank you Denis. Merging it. |
The quota subsystem is made to improve control, reliability and visibility of go-carbon.
It is not a standard graphite component, but it is backward compatible and
could be turned on optionally.
Caveat: the current implementation only supports concurrent/realtime trie index.
The quota subsystem allows user to control how many resources can be consumed
on a patter-matching based basis.
Implemented controls include: data points (based retention policy), disk size
(logical and physical), throughput, metric count, and namespaces (i.e. immediate
sub-directory count).
More details could be found in doc/quotas.md in the PR.
An example configuration:
Throttling control is implemented in
carbonserver
, while quota configis implemented in persister (mainly for convenience).