-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add :most_recent aggregation to DirectFileStore #172
Add :most_recent aggregation to DirectFileStore #172
Conversation
Hi @SuperQ @Sinjo @dmagliola. Feel free to let me know what you think. Thanks. |
Thinking about the comparison of gauge values are handled in other languages, and when in single-threaded mode, "most recent" is the standard way to do things. I wonder if this should be the default way multi-process gauges work. |
It requires cross-process locking, and we're trying to avoid that on performance grounds. |
First of all, thank you for this PR! This is a great contribution :-D
Finally, Merry Xmas to all of you :-) |
@@ -182,13 +185,15 @@ def process_id | |||
|
|||
def aggregate_values(values) | |||
if @values_aggregation_mode == SUM | |||
values.inject { |sum, element| sum + element } | |||
values.map { |element| element[0] }.inject { |sum, value| sum + value } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all 3 of these, could we map(&:first)
and then do the same thing we were doing before?
That feels a lot more readable to me.
(so: values.map(&:first).inject { |sum, element| sum + element }
, values.map(&:first).max
, etc)
I'm half wondering why we did this inject
before, instead of just calling .sum
... Is there a large performance penalty to map
? Could we measure that?
The reason I mention that is that we already have some users with performance problems on scrape, and i'd thus like to make this as fast as possible.
If map
is much slower, we should probably change this first one to the same inject as before, but the body of the block to sum + element[0]
...
And i'm not sure about the ones below, whether that's the fastest way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very open to changing this code. I mostly just wanted to get it working without considering performance that much.
That's a great thing to hear.
So my ratelimit starts at 5000 requests/hour, and decreases, that is correct. But the ratelimit is reset every hour, so if I used
Merry Christmas to all of you too! :) |
I hadn't looked at the code, I was responding to @SuperQ's comment. The issue with this approach is that it only works for set, not inc/dec. Thus it cannot be the default. |
I think banning inc/dec on this aggregation is probably the way to go. Hell, we could raise for the moment on inc/dec while allowing set, then add the inter-process locking for inc/dec later if people really want it. |
If you want inc/dec then there's no need for inter-process coordination, each process can keep it locally and we sum on exposition. |
That's a really good point on @lawrencejones I like your solution on first sight, but i'm a bit worried of two things:
Side thought... Would / could this be solved by "Custom Collectors", instead? |
I think either is fine for this use-case. Each has their caveats, but I think The caveat I had to read up on with
|
Hello @stefansundin, So, I think we should merge this feature, but i'd like to ask for a few changes, because as it stands, it's quite possible to use this in a way that will do something the user definitely didn't mean. Could we have:
Notes on making this the default:
Thank you! |
1a7dee8
to
207654d
Compare
@dmagliola I dropped the commit with the filename prefix change. I'd highly recommend testing the behavior for loading files with the old format. If there isn't any work to prevent it from happening, then I will bet that you will get reports of crashes (or whatever the behavior will be) from confused users. Hopefully only from dev machines and not from production. As for the rest, unfortunately I am short on time to work on this at the moment. Can you work on addressing the rest of the changes? This branch is allowing edits from maintainers, so if you want you can push new commits to it and then merge when desired. Thanks. |
3a0ab34
to
4c00fe9
Compare
Alright, i've made all those changes mentioned above. It's now pending review and we'll merge this and cut a new minor version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few small changes and one thought on performance. Otherwise LGTM!
@@ -32,6 +32,31 @@ | |||
metric_type: :counter, | |||
metric_settings: { some_setting: true }) | |||
end.to raise_error(Prometheus::Client::DataStores::DirectFileStore::InvalidStoreSettingsError) | |||
|
|||
# :most_recent aggregation can only be used for gauges |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer this to be done with RSpec methods describing the tests rather than a single comment at the top of a bunch of expect
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not really sure what shape the code you're thinking of has, sorry.
to me this is doing that, if we remove the comment, this is pretty much how the rest of the test case is written.
Can you mock a quick example of what you mean by "done with RSpec methods"?
metric_store2.set(labels: { foo: "baz" }, val: 2) | ||
allow(Process).to receive(:pid).and_return(12345) | ||
metric_store1.set(labels: { foo: "baz" }, val: 4) | ||
allow(Process).to receive(:pid).and_return(23456) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what these last two examples are here to test. I think all the behaviour is covered by the ones above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe i should leave only one of those two. What this is testing is a labelset that only exists in one of the stores, all the others are in both.
This reports the value that was set by a process most recently. The way this works is by tagging each value in the files with the timestamp of when they were set. For all existing aggregations, we ignore that timestamp and do what we've been doing so far. For `:most_recent`, we take the "maximum" entry according to its timestamp (i.e. the latest) and then return its value Signed-off-by: Stefan Sundin <stefan@stefansundin.com> Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
Using this aggregation with any other metric type is almost certainly not what the user intended, and it'll result in counters that go up and down, and completely inconsistent histograms. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
…egation If we do this, we'd be incrementing the value for *this* process, not the global one, which is almost certainly not what the user wants to do. This is not very pretty because we may end up raising an exception in production (as test/dev tend to not use DirectFileStore), but we consider it better than letting the user mangle their numbers and end up with incorrect metrics. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
The Monotonic clock is going to be more accurate on the few cases where the distinction matters, but it's also somehow faster than `Time.now`. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
Instead of two `read` operations, we can do both together at once Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
2c02e21
to
826b32e
Compare
In the multiprocess mode, the process that expose the metrics need to aggregate the samples from other processes. Gauge metric allows users to choose the aggregation mode. This implements 'mostrecent' (and 'livemostrecent') mode where the last observed value is exposed. In order to support this, the file format is expanded to store the timestamps in addition to the values. The stored timestamps are read by the reader process and it's used to find the latest value. The timestamp itself is exposed as a part of Prometheus exposition (https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md). This allows further aggregation across exporters. Closes prometheus#847 Consideration on the atomicity: Previously, mmap_dict.py had a comment saying "We assume that reading from an 8 byte aligned value is atomic". With this change, the value write becomes a 16 bytes 8-byte aligned write. The code author tried to find a basis on the original assumption, but couldn't find any. According to write(2), **if a file descriptor is shared**, the write becomes atomic. However, we do not share the file descriptors in the current architecture. Considering that Ruby implementation also does the same and hadn't seen an issue with it, this write atomicity problem might be practically not an issue. See also: * prometheus/client_ruby#172 The approach and naming are taken from client_ruby. * https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161 client_golang has an API for setting timestamp already. It explains the use case for the timestamp beyond the client-local aggregation. In order to support the same use case in Python, further changes are needed.
In the multiprocess mode, the process that expose the metrics need to aggregate the samples from other processes. Gauge metric allows users to choose the aggregation mode. This implements 'mostrecent' (and 'livemostrecent') mode where the last observed value is exposed. In order to support this, the file format is expanded to store the timestamps in addition to the values. The stored timestamps are read by the reader process and it's used to find the latest value. The timestamp itself is exposed as a part of Prometheus exposition (https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md). This allows further aggregation across exporters. Closes prometheus#847 Consideration on the atomicity: Previously, mmap_dict.py had a comment saying "We assume that reading from an 8 byte aligned value is atomic". With this change, the value write becomes a 16 bytes 8-byte aligned write. The code author tried to find a basis on the original assumption, but couldn't find any. According to write(2), **if a file descriptor is shared**, the write becomes atomic. However, we do not share the file descriptors in the current architecture. Considering that Ruby implementation also does the same and hadn't seen an issue with it, this write atomicity problem might be practically not an issue. See also: * prometheus/client_ruby#172 The approach and naming are taken from client_ruby. * https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161 client_golang has an API for setting timestamp already. It explains the use case for the timestamp beyond the client-local aggregation. In order to support the same use case in Python, further changes are needed. Signed-off-by: Masaya Suzuki <draftcode@gmail.com>
In the multiprocess mode, the process that exposes the metrics needs to aggregate the samples from other processes. Gauge metric allows users to choose the aggregation mode. This implements 'mostrecent' (and 'livemostrecent') mode where the last observed value is exposed. In order to support this, the file format is expanded to store the timestamps in addition to the values. The stored timestamps are read by the reader process and it's used to find the latest value. The timestamp itself is exposed as a part of Prometheus exposition (https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md). This allows further aggregation across exporters. Closes prometheus#847 Consideration on the atomicity: Previously, mmap_dict.py had a comment saying "We assume that reading from an 8 byte aligned value is atomic". With this change, the value write becomes a 16 bytes 8-byte aligned write. The code author tried to find a basis on the original assumption, but couldn't find any. According to write(2), **if a file descriptor is shared**, the write becomes atomic. However, we do not share the file descriptors in the current architecture. Considering that Ruby implementation also does the same and hadn't seen an issue with it, this write atomicity problem might be practically not an issue. See also: * prometheus/client_ruby#172 The approach and naming are taken from client_ruby. * https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161 client_golang has an API for setting timestamp already. It explains the use case for the timestamp beyond the client-local aggregation. In order to support the same use case in Python, further changes are needed. Signed-off-by: Masaya Suzuki <draftcode@gmail.com>
In the multiprocess mode, the process that exposes the metrics needs to aggregate the samples from other processes. Gauge metric allows users to choose the aggregation mode. This implements 'mostrecent' (and 'livemostrecent') mode where the last observed value is exposed. In order to support this, the file format is expanded to store the timestamps in addition to the values. The stored timestamps are read by the reader process and it's used to find the latest value. Closes prometheus#847 Consideration on the atomicity: Previously, mmap_dict.py had a comment saying "We assume that reading from an 8 byte aligned value is atomic". With this change, the value write becomes a 16 bytes 8-byte aligned write. The code author tried to find a basis on the original assumption, but couldn't find any. According to write(2), **if a file descriptor is shared**, the write becomes atomic. However, we do not share the file descriptors in the current architecture. Considering that Ruby implementation also does the same and hadn't seen an issue with it, this write atomicity problem might be practically not an issue. See also: * prometheus/client_ruby#172 The approach and naming are taken from client_ruby. * https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161 client_golang has an API for setting timestamp already. It explains the use case for the timestamp beyond the client-local aggregation. In order to support the same use case in Python, further changes are needed. Signed-off-by: Masaya Suzuki <draftcode@gmail.com>
In the multiprocess mode, the process that exposes the metrics needs to aggregate the samples from other processes. Gauge metric allows users to choose the aggregation mode. This implements 'mostrecent' (and 'livemostrecent') mode where the last observed value is exposed. In order to support this, the file format is expanded to store the timestamps in addition to the values. The stored timestamps are read by the reader process and it's used to find the latest value. Closes prometheus#847 Consideration on the atomicity: Previously, mmap_dict.py had a comment saying "We assume that reading from an 8 byte aligned value is atomic". With this change, the value write becomes a 16 bytes 8-byte aligned write. The code author tried to find a basis on the original assumption, but couldn't find any. According to write(2), **if a file descriptor is shared**, the write becomes atomic. However, we do not share the file descriptors in the current architecture. Considering that Ruby implementation also does the same and hadn't seen an issue with it, this write atomicity problem might be practically not an issue. See also: * prometheus/client_ruby#172 The approach and naming are taken from client_ruby. * https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161 client_golang has an API for setting timestamp already. It explains the use case for the timestamp beyond the client-local aggregation. In order to support the same use case in Python, further changes are needed. Signed-off-by: Masaya Suzuki <draftcode@gmail.com>
In the multiprocess mode, the process that exposes the metrics needs to aggregate the samples from other processes. Gauge metric allows users to choose the aggregation mode. This implements 'mostrecent' (and 'livemostrecent') mode where the last observed value is exposed. In order to support this, the file format is expanded to store the timestamps in addition to the values. The stored timestamps are read by the reader process and it's used to find the latest value. Closes prometheus#847 Consideration on the atomicity: Previously, mmap_dict.py had a comment saying "We assume that reading from an 8 byte aligned value is atomic". With this change, the value write becomes a 16 bytes 8-byte aligned write. The code author tried to find a basis on the original assumption, but couldn't find any. According to write(2), **if a file descriptor is shared**, the write becomes atomic. However, we do not share the file descriptors in the current architecture. Considering that Ruby implementation also does the same and hadn't seen an issue with it, this write atomicity problem might be practically not an issue. See also: * prometheus/client_ruby#172 The approach and naming are taken from client_ruby. * https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161 client_golang has an API for setting timestamp already. It explains the use case for the timestamp beyond the client-local aggregation. In order to support the same use case in Python, further changes are needed. Signed-off-by: Masaya Suzuki <draftcode@gmail.com>
In the multiprocess mode, the process that exposes the metrics needs to aggregate the samples from other processes. Gauge metric allows users to choose the aggregation mode. This implements 'mostrecent' (and 'livemostrecent') mode where the last observed value is exposed. In order to support this, the file format is expanded to store the timestamps in addition to the values. The stored timestamps are read by the reader process and it's used to find the latest value. Closes prometheus#847 Consideration on the atomicity: Previously, mmap_dict.py had a comment saying "We assume that reading from an 8 byte aligned value is atomic". With this change, the value write becomes a 16 bytes 8-byte aligned write. The code author tried to find a basis on the original assumption, but couldn't find any. According to write(2), **if a file descriptor is shared**, the write becomes atomic. However, we do not share the file descriptors in the current architecture. Considering that Ruby implementation also does the same and hadn't seen an issue with it, this write atomicity problem might be practically not an issue. See also: * prometheus/client_ruby#172 The approach and naming are taken from client_ruby. * https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161 client_golang has an API for setting timestamp already. It explains the use case for the timestamp beyond the client-local aggregation. In order to support the same use case in Python, further changes are needed. Signed-off-by: Masaya Suzuki <draftcode@gmail.com>
In the multiprocess mode, the process that exposes the metrics needs to aggregate the samples from other processes. Gauge metric allows users to choose the aggregation mode. This implements 'mostrecent' (and 'livemostrecent') mode where the last observed value is exposed. In order to support this, the file format is expanded to store the timestamps in addition to the values. The stored timestamps are read by the reader process and it's used to find the latest value. Closes prometheus#847 Consideration on the atomicity: Previously, mmap_dict.py had a comment saying "We assume that reading from an 8 byte aligned value is atomic". With this change, the value write becomes a 16 bytes 8-byte aligned write. The code author tried to find a basis on the original assumption, but couldn't find any. According to write(2), **if a file descriptor is shared**, the write becomes atomic. However, we do not share the file descriptors in the current architecture. Considering that Ruby implementation also does the same and hadn't seen an issue with it, this write atomicity problem might be practically not an issue. See also: * prometheus/client_ruby#172 The approach and naming are taken from client_ruby. * https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161 client_golang has an API for setting timestamp already. It explains the use case for the timestamp beyond the client-local aggregation. In order to support the same use case in Python, further changes are needed. Signed-off-by: Masaya Suzuki <draftcode@gmail.com>
In the multiprocess mode, the process that exposes the metrics needs to aggregate the samples from other processes. Gauge metric allows users to choose the aggregation mode. This implements 'mostrecent' (and 'livemostrecent') mode where the last observed value is exposed. In order to support this, the file format is expanded to store the timestamps in addition to the values. The stored timestamps are read by the reader process and it's used to find the latest value. Closes #847 Consideration on the atomicity: Previously, mmap_dict.py had a comment saying "We assume that reading from an 8 byte aligned value is atomic". With this change, the value write becomes a 16 bytes 8-byte aligned write. The code author tried to find a basis on the original assumption, but couldn't find any. According to write(2), **if a file descriptor is shared**, the write becomes atomic. However, we do not share the file descriptors in the current architecture. Considering that Ruby implementation also does the same and hadn't seen an issue with it, this write atomicity problem might be practically not an issue. See also: * prometheus/client_ruby#172 The approach and naming are taken from client_ruby. * https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161 client_golang has an API for setting timestamp already. It explains the use case for the timestamp beyond the client-local aggregation. In order to support the same use case in Python, further changes are needed. Signed-off-by: Masaya Suzuki <draftcode@gmail.com>
Add
:most_recent
aggregation to DirectFileStore, which reports the value that was set by a process most recently.I am using this for a small app called github-activity, and it tracks the remaining GitHub ratelimit using Prometheus (this is the only metric at the moment). The existing aggregations didn't really fit my use case, since for this use case, the last reported value from GitHub is the one that I care about. So I decided to try to improve the gem to support my use case, and I came up with this.
Please let me know about nitpicks and other things. I wasn't sure what the best name would be for this aggregation, and I was basically picking between
:latest
and:most_recent
. And I am unsure if this should be usingCLOCK_MONOTONIC
or ifTime.now.to_f
is Ok.Thanks!
Useful links: