Add :most_recent aggregation to DirectFileStore #172

stefansundin · 2019-12-23T05:57:00Z

Add :most_recent aggregation to DirectFileStore, which reports the value that was set by a process most recently.

I am using this for a small app called github-activity, and it tracks the remaining GitHub ratelimit using Prometheus (this is the only metric at the moment). The existing aggregations didn't really fit my use case, since for this use case, the last reported value from GitHub is the one that I care about. So I decided to try to improve the gem to support my use case, and I came up with this.

Please let me know about nitpicks and other things. I wasn't sure what the best name would be for this aggregation, and I was basically picking between :latest and :most_recent. And I am unsure if this should be using CLOCK_MONOTONIC or if Time.now.to_f is Ok.

Thanks!

Useful links:

The GitHub repo for my app: https://github.com/stefansundin/github-activity
The commit in question: stefansundin/github-activity@936d4d6
The app is currently running at: https://gh-rss.herokuapp.com/
The metrics endpoint is public here: https://gh-rss.herokuapp.com/metrics.

coveralls · 2019-12-23T06:01:12Z

Coverage remained the same at 100.0% when pulling 207654d on stefansundin:add-most_recent-aggregation into 7129453 on prometheus:master.

coveralls · 2019-12-23T06:01:13Z

Coverage remained the same at 100.0% when pulling 826b32e on stefansundin:add-most_recent-aggregation into 204d8c9 on prometheus:master.

stefansundin · 2019-12-24T05:19:36Z

Hi @SuperQ @Sinjo @dmagliola. Feel free to let me know what you think. Thanks.

SuperQ · 2019-12-24T11:37:30Z

Thinking about the comparison of gauge values are handled in other languages, and when in single-threaded mode, "most recent" is the standard way to do things. I wonder if this should be the default way multi-process gauges work.

brian-brazil · 2019-12-24T12:21:21Z

I wonder if this should be the default way multi-process gauges work.

It requires cross-process locking, and we're trying to avoid that on performance grounds.

lib/prometheus/client/data_stores/direct_file_store.rb

dmagliola · 2019-12-24T18:34:04Z

First of all, thank you for this PR! This is a great contribution :-D
So, I have a number of thoughts on this. Not necessarily in order...

I like the idea, since it's something we were thinking of doing for our own purposes. @lawrencejones is this more or less what you had in mind? Are you happy with this approach?
Also @lawrencejones or @Sinjo do you have an opinion on CLOCK_MONOTONIC vs Time.now.to_f? You both have a clearer idea of this than I do.
I was a little concerned about the performance of this, but it seems to be roughly the same as what we currently have (2% to 3% slower, maybe), so 👍 on that. Files are probably going to be a bit bigger, but it doesn't seem to be a huge deal.
@SuperQ I kinda like the idea of making this the default, because our current default (all) creates a lot of time series, potentially unnecessarily. My main worry is that that's a backwards-incompatible change, so i'm not sure we can introduce that. Maybe we should start a list of breaking changes that we'd like to introduce next time we release a major version, and start with that one?
@brian-brazil Unless i'm misunderstanding your comment, this shouldn't be a problem the way this is implemented. This is not locking across processes. Instead, each value is paired with the timestamp of when it was set, and which one was latest gets resolved at scrape time, which I thought was quite an elegant approach. Does that alleviate that concern, or did I misunderstand?
@stefansundin The thing that gives me pause is that, in the cases where we wanted this feature, we got around it by either doing a min aggregation on the data store, or on the Prometheus Query. In your case, it feels like this would do the trick too, right? The "latest" and the "minimum" should be the same given that it's a rate limit that presumably decreases over time? In any case, you'd need the latest value across all your servers, which this wouldn't be able to give you, so you'd still need the aggregation at the query level. With that in mind, do we need this feature?

Finally, Merry Xmas to all of you :-)

dmagliola · 2019-12-24T18:41:41Z

lib/prometheus/client/data_stores/direct_file_store.rb

@@ -182,13 +185,15 @@ def process_id

          def aggregate_values(values)
            if @values_aggregation_mode == SUM
-              values.inject { |sum, element| sum + element }
+              values.map { |element| element[0] }.inject { |sum, value| sum + value }


For all 3 of these, could we map(&:first) and then do the same thing we were doing before?
That feels a lot more readable to me.

(so: values.map(&:first).inject { |sum, element| sum + element }, values.map(&:first).max, etc)
I'm half wondering why we did this inject before, instead of just calling .sum... Is there a large performance penalty to map? Could we measure that?
The reason I mention that is that we already have some users with performance problems on scrape, and i'd thus like to make this as fast as possible.

If map is much slower, we should probably change this first one to the same inject as before, but the body of the block to sum + element[0]...
And i'm not sure about the ones below, whether that's the fastest way?

I am very open to changing this code. I mostly just wanted to get it working without considering performance that much.

lib/prometheus/client/data_stores/direct_file_store.rb

stefansundin · 2019-12-24T18:47:19Z

First of all, thank you for this PR! This is a great contribution :-D

That's a great thing to hear.

@stefansundin The thing that gives me pause is that, in the cases where we wanted this feature, we got around it by either doing a min aggregation on the data store, or on the Prometheus Query. In your case, it feels like this would do the trick too, right? The "latest" and the "minimum" should be the same given that it's a rate limit that presumably decreases over time? In any case, you'd need the latest value across all your servers, which this wouldn't be able to give you, so you'd still need the aggregation at the query level. With that in mind, do we need this feature?

So my ratelimit starts at 5000 requests/hour, and decreases, that is correct. But the ratelimit is reset every hour, so if I used min then I'd always see the lowest value even after the reset. If I have multiple servers, they would probably have their own request IP from GitHub's point of view, and in this case the ratelimit would be independent, so I would need to track all of them individually. This change lets me omit metric series that I already know are "wrong" (in the sense that I already know, at scrape-time, that the information is out of date).

Finally, Merry Xmas to all of you :-)

Merry Christmas to all of you too! :)

brian-brazil · 2019-12-24T21:12:38Z

This is not locking across processes. Instead, each value is paired with the timestamp of when it was set, and which one was latest gets resolved at scrape time, which I thought was quite an elegant approach.

I hadn't looked at the code, I was responding to @SuperQ's comment. The issue with this approach is that it only works for set, not inc/dec. Thus it cannot be the default.

lawrencejones · 2019-12-28T10:09:10Z

I think banning inc/dec on this aggregation is probably the way to go. Hell, we could raise for the moment on inc/dec while allowing set, then add the inter-process locking for inc/dec later if people really want it.

brian-brazil · 2019-12-28T10:21:21Z

If you want inc/dec then there's no need for inter-process coordination, each process can keep it locally and we sum on exposition.

dmagliola · 2020-01-07T14:52:27Z

That's a really good point on inc/dec, Brian, thanks for raising that.

@lawrencejones I like your solution on first sight, but i'm a bit worried of two things:

the store behaves somewhat differently when observing based on the aggregation method you choose, which feels a bit weird to me.
more importantly, i'm a bit hesitant about raising an exception, since it's highly possible you won't find out about it until it's running in production. I can very easily see how you'd use the in-memory stores for dev/test, and only use the file store in prod. (And the reason this distinction is important is that the only store that supports aggregation is the FileStore, the others wouldn't know they should raise)

Side thought... Would / could this be solved by "Custom Collectors", instead?
I'm thinking of the value to be exposed getting calculated when the scrape happens, instead of storing separate values for each process, and then having to resolve which one is the latest...

Sinjo · 2020-01-28T15:43:12Z

do you have an opinion on CLOCK_MONOTONIC vs Time.now.to_f? You both have a clearer idea of this than I do.

I think either is fine for this use-case. Each has their caveats, but I think CLOCK_MONOTONIC is ever so slightly better for what we're doing and I've included my reasoning below.

The caveat I had to read up on with CLOCK_MONOTONIC was whether it was consistent between different processes. It turns out that as long as you're comparing values from the same boot of the same computer, you're fine. If you compare values from different computers or from different boots of the same computer, they will be meaningless.

Time.now.to_f will be susceptible to clock adjustments by ntp, but not to DST changes. Just tested with two irb sessions, leaving one in London time and setting TZ=Europe/Paris in the env of the other. There's no 1 hour offset between them (unlike Time.now by itself).

dmagliola · 2020-01-28T16:50:35Z

Hello @stefansundin,
Sorry about the delay in responding.

So, I think we should merge this feature, but i'd like to ask for a few changes, because as it stands, it's quite possible to use this in a way that will do something the user definitely didn't mean.

Could we have:

a validation in validate_metric_settings that makes sure most_recent can only be used for Gauges? Using this aggregation with any other metric type is almost certainly not what the user intended, and it'll result in counters that go up and down, and completely inconsistent histograms.
a validation on increment that throws an exception if the aggregation type is most_recent. This will also result in not what the user wants.
documentation in the README of this aggregation mode, when it's useful, and a big note that it's only useful for Gauges, and that it doesn't allow using increment/decrement
i'd get rid of the filename prefix / renaming for this PR. The code so far assumes you're starting from an empty directory, and having that in this PR mixes the concerns we're trying to solve.
see my comment on adding a comment above aggregate_values documenting what it returns, which should make it much easier to follow the code. Also the code about mapping / how to aggregate the values.
let's switch to CLOCK_MONOTONIC as per @Sinjo's comment above.

Notes on making this the default:

SInce this has unobvious caveats, like the fact that you can't use inc/dec on gauges that have this aggregation, and that you might only find about that in production if your test environment doesn't use the FileStore, I would like to make it a manual option that someone has to find out about by reading the docs, at which point we warn them about this. If this is the default, we really need to make inc/dec work, which is quite a lot of work if we don't want to be locking between processes.

Thank you!

stefansundin · 2020-02-03T04:47:33Z

@dmagliola I dropped the commit with the filename prefix change. I'd highly recommend testing the behavior for loading files with the old format. If there isn't any work to prevent it from happening, then I will bet that you will get reports of crashes (or whatever the behavior will be) from confused users. Hopefully only from dev machines and not from production.

As for the rest, unfortunately I am short on time to work on this at the moment. Can you work on addressing the rest of the changes? This branch is allowing edits from maintainers, so if you want you can push new commits to it and then merge when desired. Thanks.

dmagliola · 2020-02-25T11:44:29Z

Alright, i've made all those changes mentioned above. It's now pending review and we'll merge this and cut a new minor version

Sinjo

Few small changes and one thought on performance. Otherwise LGTM!

lib/prometheus/client/data_stores/direct_file_store.rb

Sinjo · 2020-06-21T18:33:34Z

spec/prometheus/client/data_stores/direct_file_store_spec.rb

@@ -32,6 +32,31 @@
                         metric_type: :counter,
                         metric_settings: { some_setting: true })
    end.to raise_error(Prometheus::Client::DataStores::DirectFileStore::InvalidStoreSettingsError)
+
+    # :most_recent aggregation can only be used for gauges


I'd prefer this to be done with RSpec methods describing the tests rather than a single comment at the top of a bunch of expects.

i'm not really sure what shape the code you're thinking of has, sorry.
to me this is doing that, if we remove the comment, this is pretty much how the rest of the test case is written.

Can you mock a quick example of what you mean by "done with RSpec methods"?

Sinjo · 2020-06-21T18:39:39Z

spec/prometheus/client/data_stores/direct_file_store_spec.rb

+      metric_store2.set(labels: { foo: "baz" }, val: 2)
+      allow(Process).to receive(:pid).and_return(12345)
+      metric_store1.set(labels: { foo: "baz" }, val: 4)
+      allow(Process).to receive(:pid).and_return(23456)


Not sure what these last two examples are here to test. I think all the behaviour is covered by the ones above?

maybe i should leave only one of those two. What this is testing is a labelset that only exists in one of the stores, all the others are in both.

spec/prometheus/client/data_stores/direct_file_store_spec.rb

This reports the value that was set by a process most recently. The way this works is by tagging each value in the files with the timestamp of when they were set. For all existing aggregations, we ignore that timestamp and do what we've been doing so far. For `:most_recent`, we take the "maximum" entry according to its timestamp (i.e. the latest) and then return its value Signed-off-by: Stefan Sundin <stefan@stefansundin.com> Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

Using this aggregation with any other metric type is almost certainly not what the user intended, and it'll result in counters that go up and down, and completely inconsistent histograms. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

…egation If we do this, we'd be incrementing the value for *this* process, not the global one, which is almost certainly not what the user wants to do. This is not very pretty because we may end up raising an exception in production (as test/dev tend to not use DirectFileStore), but we consider it better than letting the user mangle their numbers and end up with incorrect metrics. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

The Monotonic clock is going to be more accurate on the few cases where the distinction matters, but it's also somehow faster than `Time.now`. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

Instead of two `read` operations, we can do both together at once Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

In the multiprocess mode, the process that expose the metrics need to aggregate the samples from other processes. Gauge metric allows users to choose the aggregation mode. This implements 'mostrecent' (and 'livemostrecent') mode where the last observed value is exposed. In order to support this, the file format is expanded to store the timestamps in addition to the values. The stored timestamps are read by the reader process and it's used to find the latest value. The timestamp itself is exposed as a part of Prometheus exposition (https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md). This allows further aggregation across exporters. Closes prometheus#847 Consideration on the atomicity: Previously, mmap_dict.py had a comment saying "We assume that reading from an 8 byte aligned value is atomic". With this change, the value write becomes a 16 bytes 8-byte aligned write. The code author tried to find a basis on the original assumption, but couldn't find any. According to write(2), **if a file descriptor is shared**, the write becomes atomic. However, we do not share the file descriptors in the current architecture. Considering that Ruby implementation also does the same and hadn't seen an issue with it, this write atomicity problem might be practically not an issue. See also: * prometheus/client_ruby#172 The approach and naming are taken from client_ruby. * https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161 client_golang has an API for setting timestamp already. It explains the use case for the timestamp beyond the client-local aggregation. In order to support the same use case in Python, further changes are needed.

In the multiprocess mode, the process that expose the metrics need to aggregate the samples from other processes. Gauge metric allows users to choose the aggregation mode. This implements 'mostrecent' (and 'livemostrecent') mode where the last observed value is exposed. In order to support this, the file format is expanded to store the timestamps in addition to the values. The stored timestamps are read by the reader process and it's used to find the latest value. The timestamp itself is exposed as a part of Prometheus exposition (https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md). This allows further aggregation across exporters. Closes prometheus#847 Consideration on the atomicity: Previously, mmap_dict.py had a comment saying "We assume that reading from an 8 byte aligned value is atomic". With this change, the value write becomes a 16 bytes 8-byte aligned write. The code author tried to find a basis on the original assumption, but couldn't find any. According to write(2), **if a file descriptor is shared**, the write becomes atomic. However, we do not share the file descriptors in the current architecture. Considering that Ruby implementation also does the same and hadn't seen an issue with it, this write atomicity problem might be practically not an issue. See also: * prometheus/client_ruby#172 The approach and naming are taken from client_ruby. * https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161 client_golang has an API for setting timestamp already. It explains the use case for the timestamp beyond the client-local aggregation. In order to support the same use case in Python, further changes are needed. Signed-off-by: Masaya Suzuki <draftcode@gmail.com>

In the multiprocess mode, the process that exposes the metrics needs to aggregate the samples from other processes. Gauge metric allows users to choose the aggregation mode. This implements 'mostrecent' (and 'livemostrecent') mode where the last observed value is exposed. In order to support this, the file format is expanded to store the timestamps in addition to the values. The stored timestamps are read by the reader process and it's used to find the latest value. The timestamp itself is exposed as a part of Prometheus exposition (https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md). This allows further aggregation across exporters. Closes prometheus#847 Consideration on the atomicity: Previously, mmap_dict.py had a comment saying "We assume that reading from an 8 byte aligned value is atomic". With this change, the value write becomes a 16 bytes 8-byte aligned write. The code author tried to find a basis on the original assumption, but couldn't find any. According to write(2), **if a file descriptor is shared**, the write becomes atomic. However, we do not share the file descriptors in the current architecture. Considering that Ruby implementation also does the same and hadn't seen an issue with it, this write atomicity problem might be practically not an issue. See also: * prometheus/client_ruby#172 The approach and naming are taken from client_ruby. * https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161 client_golang has an API for setting timestamp already. It explains the use case for the timestamp beyond the client-local aggregation. In order to support the same use case in Python, further changes are needed. Signed-off-by: Masaya Suzuki <draftcode@gmail.com>

In the multiprocess mode, the process that exposes the metrics needs to aggregate the samples from other processes. Gauge metric allows users to choose the aggregation mode. This implements 'mostrecent' (and 'livemostrecent') mode where the last observed value is exposed. In order to support this, the file format is expanded to store the timestamps in addition to the values. The stored timestamps are read by the reader process and it's used to find the latest value. Closes prometheus#847 Consideration on the atomicity: Previously, mmap_dict.py had a comment saying "We assume that reading from an 8 byte aligned value is atomic". With this change, the value write becomes a 16 bytes 8-byte aligned write. The code author tried to find a basis on the original assumption, but couldn't find any. According to write(2), **if a file descriptor is shared**, the write becomes atomic. However, we do not share the file descriptors in the current architecture. Considering that Ruby implementation also does the same and hadn't seen an issue with it, this write atomicity problem might be practically not an issue. See also: * prometheus/client_ruby#172 The approach and naming are taken from client_ruby. * https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161 client_golang has an API for setting timestamp already. It explains the use case for the timestamp beyond the client-local aggregation. In order to support the same use case in Python, further changes are needed. Signed-off-by: Masaya Suzuki <draftcode@gmail.com>

In the multiprocess mode, the process that exposes the metrics needs to aggregate the samples from other processes. Gauge metric allows users to choose the aggregation mode. This implements 'mostrecent' (and 'livemostrecent') mode where the last observed value is exposed. In order to support this, the file format is expanded to store the timestamps in addition to the values. The stored timestamps are read by the reader process and it's used to find the latest value. Closes #847 Consideration on the atomicity: Previously, mmap_dict.py had a comment saying "We assume that reading from an 8 byte aligned value is atomic". With this change, the value write becomes a 16 bytes 8-byte aligned write. The code author tried to find a basis on the original assumption, but couldn't find any. According to write(2), **if a file descriptor is shared**, the write becomes atomic. However, we do not share the file descriptors in the current architecture. Considering that Ruby implementation also does the same and hadn't seen an issue with it, this write atomicity problem might be practically not an issue. See also: * prometheus/client_ruby#172 The approach and naming are taken from client_ruby. * https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161 client_golang has an API for setting timestamp already. It explains the use case for the timestamp beyond the client-local aggregation. In order to support the same use case in Python, further changes are needed. Signed-off-by: Masaya Suzuki <draftcode@gmail.com>

dmagliola reviewed Dec 24, 2019

View reviewed changes

lib/prometheus/client/data_stores/direct_file_store.rb Outdated Show resolved Hide resolved

dmagliola reviewed Dec 24, 2019

View reviewed changes

stefansundin mentioned this pull request Dec 28, 2019

Add :clean_dir option and clean_pid method to DirectFileStore #173

Closed

stefansundin force-pushed the add-most_recent-aggregation branch from 1a7dee8 to 207654d Compare February 3, 2020 04:42

dmagliola force-pushed the add-most_recent-aggregation branch 3 times, most recently from 3a0ab34 to 4c00fe9 Compare February 25, 2020 11:24

dmagliola mentioned this pull request May 20, 2020

Redis data store: Why are per-process keys required for MIN/MAX aggreggations? gocardless/prometheus-client-ruby-data-stores-experiments#1

Closed

dmagliola requested a review from Sinjo May 21, 2020 14:38

dmagliola mentioned this pull request May 22, 2020

Supporting OpenMetrics #189

Open

19 tasks

Sinjo requested changes Jun 21, 2020

View reviewed changes

Sinjo approved these changes Jun 22, 2020

View reviewed changes

stefansundin and others added 3 commits June 22, 2020 17:37

Daniel Magliola added 2 commits June 22, 2020 17:39

Use monotonic clock when timestamping observed values

6bb7144

The Monotonic clock is going to be more accurate on the few cases where the distinction matters, but it's also somehow faster than `Time.now`. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

Small file read performance improvement for DirectFileStore

826b32e

Instead of two `read` operations, we can do both together at once Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

dmagliola force-pushed the add-most_recent-aggregation branch from 2c02e21 to 826b32e Compare June 22, 2020 16:39

dmagliola merged commit c1d9acd into prometheus:master Jun 22, 2020

Envek mentioned this pull request Feb 5, 2021

Metrics are not aggregated when in clustered mode yabeda-rb/yabeda-puma-plugin#15

Closed

csmarchbanks mentioned this pull request Mar 13, 2023

"last" aggregation in Multiprocess_mode (Gauge) prometheus/client_python#847

Closed

This was referenced Oct 17, 2023

Add mostrecent aggregation to Gauge draftcode/client_python#1

Closed

Add mostrecent aggregation to Gauge prometheus/client_python#967

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add :most_recent aggregation to DirectFileStore #172

Add :most_recent aggregation to DirectFileStore #172

stefansundin commented Dec 23, 2019

coveralls commented Dec 23, 2019

coveralls commented Dec 23, 2019 •

edited

Loading

stefansundin commented Dec 24, 2019

SuperQ commented Dec 24, 2019

brian-brazil commented Dec 24, 2019

dmagliola commented Dec 24, 2019

dmagliola Dec 24, 2019

stefansundin Dec 24, 2019

stefansundin commented Dec 24, 2019

brian-brazil commented Dec 24, 2019

lawrencejones commented Dec 28, 2019

brian-brazil commented Dec 28, 2019

dmagliola commented Jan 7, 2020

Sinjo commented Jan 28, 2020

dmagliola commented Jan 28, 2020

stefansundin commented Feb 3, 2020

dmagliola commented Feb 25, 2020

Sinjo left a comment

Sinjo Jun 21, 2020

dmagliola Jun 22, 2020

Sinjo Jun 21, 2020

dmagliola Jun 22, 2020

Add :most_recent aggregation to DirectFileStore #172

Add :most_recent aggregation to DirectFileStore #172

Conversation

stefansundin commented Dec 23, 2019

coveralls commented Dec 23, 2019

coveralls commented Dec 23, 2019 • edited Loading

stefansundin commented Dec 24, 2019

SuperQ commented Dec 24, 2019

brian-brazil commented Dec 24, 2019

dmagliola commented Dec 24, 2019

dmagliola Dec 24, 2019

Choose a reason for hiding this comment

stefansundin Dec 24, 2019

Choose a reason for hiding this comment

stefansundin commented Dec 24, 2019

brian-brazil commented Dec 24, 2019

lawrencejones commented Dec 28, 2019

brian-brazil commented Dec 28, 2019

dmagliola commented Jan 7, 2020

Sinjo commented Jan 28, 2020

dmagliola commented Jan 28, 2020

stefansundin commented Feb 3, 2020

dmagliola commented Feb 25, 2020

Sinjo left a comment

Choose a reason for hiding this comment

Sinjo Jun 21, 2020

Choose a reason for hiding this comment

dmagliola Jun 22, 2020

Choose a reason for hiding this comment

Sinjo Jun 21, 2020

Choose a reason for hiding this comment

dmagliola Jun 22, 2020

Choose a reason for hiding this comment

coveralls commented Dec 23, 2019 •

edited

Loading