Store and query classic histograms as native histograms with custom buckets #31

krajorama · 2024-01-26T10:39:07Z

To support RW Self-Contained Histograms which is about the need to make writing histograms atomic, in particular to avoid a situation when series of a classic histogram are partially written to (remote) storage. For more information consult the referenced design document.

To make storing classic histograms more efficient by taking advantage of the design of native histograms.

Finally, fully custom bucket layouts is a larger project with wider scope. By reducing the scope we can have a shorter development cycle and offer a good feature and savings sooner.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

bwplotka · 2024-01-26T10:44:53Z

Nice! Will look in ~1h, thanks! 💪🏽

bwplotka

Nice! LGTM, some nits only.

proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>

Copy over options 1,2,3 from original. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

bwplotka

Good to go from my side

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

SuperQ

While I see the desire to provide scrape config level flexibility. I think from an operator perspective I would prefer this be a global, per-Prometheus, option. I would not want inconsistent query or data behavior on the same instance of Prometheus. Some jobs emitting classic histograms, some jobs emitting native histograms. It's hard enough already with the existing classic histograms.

SuperQ · 2024-01-28T09:23:32Z

proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md

+## Goals
+
+* No change to instrumentation.
+* No change to the query side. *Might not be achieved in the first iteration/ever. The ingestion and storage part can be fully implemented without any changes to the query part. A compatibility layer for querying can be introduced later as needed.*


I almost feel like this should be a non-goal. If we're going to convert classic to native, I feel like users that opt-in to such a feature would want all the normal behaviors of native histograms.

Let's introduce a 3rd section (next to Goals and Non-goals): Maybe-goals. :)

proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md

…grams.md Co-authored-by: Ben Kochie <superq@gmail.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md

…grams.md Co-authored-by: Ben Kochie <superq@gmail.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>

SuperQ · 2024-01-30T18:49:41Z

Nice

proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md

beorn7 · 2024-01-30T17:58:06Z

proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md

+
+To make storing classic histograms more efficient by taking advantage of the design of native histograms.
+
+Finally, fully custom bucket layouts is a larger project with wider scope. By reducing the scope we can have a shorter development cycle and offer a good feature and savings sooner.


Hmm, so actually this implements the "hardest" case where the boundaries are fully specified explicitly. It is hard, because it requires additional sub-sections in the existing native histogram data structures.

The "easy" ones are new schemas that have a simple rule to calculate bucket boundaries. For them, we just need to pick a new schema number in between -128 and +127 that isn't taken yet and add the code to calculate the bucket boundaries (and maybe how to handle arithmetic with mixed schema histograms). For example, "log-linear with 90 buckets per power of 10" could be schema 90. Then we just need some convention how to translate a bucket index number into a particular bucket in this log-linear schema.

Purely linear is a bit harder because we probably want an arbitrary absolute increase from bucket to bucket, but we could recycle the bucket boundary data structure from this design doc.

In other words, I think if we implement the custom buckets described here carefully, it will be very easy to add other schemas like the examples above.

And I'm writing all of this here because

I want you to apply that care

it makes this whole effort even better because it unlocks the other bucketing schemas (that I have been asked for already at conferences).

beorn7 · 2024-01-30T18:01:37Z

proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md

+## Non-Goals
+
+* New instrumentation for defining the custom buckets.
+* Interoperability with (exponential bucket) native histograms.


That could be a "maybe-goal", too.

I.e. in the (unlikely) case that a bucketing schema from a classic histogram is convertible into something that can be merged with a exponential native histogram, the PromQL engine detect that and do the merge.

beorn7 · 2024-01-30T18:32:11Z

proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md

+
+Enhance the internal representation of histograms (both float and [integer](https://github.com/prometheus/prometheus/blob/main/model/histogram/histogram.go)) with a nil-able slice of custom bucket definitions. No need to change spans/deltas/values slices.
+
+The counters for the custom buckets should be stored as integer values if possible. To be compatible with existing precision of the classic histogram representation within a to be defined 𝜎. The GO statement `x == math.Trunc(x)` has an error of around `1e-16` - experimentally.


Hmm, is that really required?

For one, exposition formats make it quite clear if a histogram is integer or float. It is explicit in protobuf. OpenMetrics text format currently doesn't even support float histograms. And assuming it does so in the future, the floats are explicitly marked by the presence of dots.

Classic Prometheus text format isn't explicit about it, but we could still say we ingest as float histogram, if at least one bucket or the count has a dot and has non-zero digits after the dot.

There is the edge case of explicitly marked float histograms that are effectively integer histograms because all contained floats have only zeros after the floating point. Currently, we store those histograms as float histograms if they are native histograms, and that's probably fine as it is rare. But we do have the plan to optimize that case and opportunistically convert them to an integer histogram. We would only do so if every involved float is effectively an integer.

The way IEEE 754 floats work, this is always clear. We don't need any rounding. If you convert an integer to a float, then math.Trunc(x) == x, even if you are beyond the range where float64 represents each integer precisely (±2^53).

This is also how we did it in Prometheus v1 (which opportunistically stored the simple float64 samples as integers if they were effectively (and precisely) integers).

beorn7 · 2024-01-30T18:35:31Z

proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md

+### Scrape configuration
+
+1. The feature is disabled if the feature flag [`native-histograms`](https://prometheus.io/docs/prometheus/latest/feature_flags/#native-histograms) is disabled.
+2. If native histograms feature is enabled, custom histograms can be enabled in `scrape_config` by setting the configration option `convert_classic_histograms` to `true`.


To address @SuperQ 's comment: I think there should definitely be an option per-scrape-config, but we could also provide a global option for those that want the feature always.

proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md

beorn7 · 2024-01-30T18:46:39Z

proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md

+
+Remote write protocol
+
+The `message.Histogram` type to be expanded with nullable repeated custom_buckets field that list the custom bucket definitions (except `+Inf`, which is implicit). There should be a comment which specifies which schema number means that we need to even look at this field. It should be a validation error to find this field null if the custom bucket schema number is used.


It should be noted that we don't have to specify that the last boundary is +Inf (as that's always the case) but we should store a count for the final +Inf bucket. In current classic histograms, that value is always equal to the count value, but in native histogram, we specify that observations of NaN increase the count, but not any bucket.

Once the custom buckets are implemented, we can in principle use them as pure native histograms with custom buckets on the instrumentation side, and then we would also get this behavior.

(In different news, I think it is a design flaw of classic histograms that they effectively count NaN observations in the +Inf bucket. On the other hand, it doesn't really matter in practices, but now that it is cleanly implemented in native histograms, we should keep that behavior throughout.)

proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md

beorn7 · 2024-01-30T18:53:51Z

proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md

+* Would we ever want to store the old representation and the new one at the same time?
+  *Answer:* YES. Already should work via the `existing scrape_classic_histograms` option.
+* What to do in queries if custom histogram and exponential histogram meet or customer histogram and float sample?
+  *Answer:* same as today with float vs native histogram, that is calculate the result if it makes mathematical sense. For example multiplying a custom histogram with the number 2.0 makes sense. In case of histograms they need to rescaled to match their schema.


"rescaled" is confusing (as we also use it if we multiply a histogram with a float). Maybe "change resolution" or something. It should also be added that this might not work precisely, and that in that case we rather give a warning and don't do the calculation instead of doing something wonky.

Examples (not necessarily to add here, just to create understanding for the reviewers):

Custom histogram with boundaries 1, 2, 4, 8 and no sample in the +Inf bucket → can be merged with any exponential histogram.

Anything in the +Inf bucket: Doesn't work with exponential histogram.

Custom histogram with boundaries 1, 5, 10, 20, +Inf added to another custom histogram with 1, 10, 20, +Inf: We can do the math by first merging the 5-bucket and the 10-bucket together in the first histogram and then add both, resulting in a 1, 10, 20 histogram.

beorn7 · 2024-01-30T18:54:23Z

proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md

+* What to do in queries if custom histogram and exponential histogram meet or customer histogram and float sample?
+  *Answer:* same as today with float vs native histogram, that is calculate the result if it makes mathematical sense. For example multiplying a custom histogram with the number 2.0 makes sense. In case of histograms they need to rescaled to match their schema.
+* Should we use a bigger chunk size for such custom histograms? To offset that we’d want to store the bucket layout in the chunk header. ~4K?
+  *Answer:* NO. Classic histograms typically have less buckets than exponential native histograms which should offset any additional information encoded in the chunk.


Suggested change

*Answer:* NO. Classic histograms typically have less buckets than exponential native histograms which should offset any additional information encoded in the chunk.

*Answer:* NO. Classic histograms typically have fewer buckets than exponential native histograms, which should offset any additional information encoded in the chunk.

beorn7 · 2024-01-30T18:58:08Z

This was merged during my review.

I recommend to still read my thoughtful comments and apply them as needed. 🙏

Clarifications and updates for late comments to prometheus#31. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Initial version from Google doc.

f9a43e6

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

krajorama force-pushed the custom-histograms branch from 9d38be4 to f9a43e6 Compare January 26, 2024 10:41

bwplotka approved these changes Jan 26, 2024

View reviewed changes

krajorama and others added 3 commits January 26, 2024 12:04

Apply suggestions from code review

45832c8

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>

Update from review comments

0b5f85f

Copy over options 1,2,3 from original. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Add section on scrape config and exemplars

428cc0e

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

bwplotka approved these changes Jan 26, 2024

View reviewed changes

krajorama added 2 commits January 26, 2024 16:38

Add note about impact on le label and documentation section

a870187

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Minor updates

6571524

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

krajorama mentioned this pull request Jan 28, 2024

[meta] Native Histograms Custom Buckets prometheus/prometheus#13485

Open

12 tasks

SuperQ reviewed Jan 28, 2024

View reviewed changes

krajorama commented Jan 29, 2024

View reviewed changes

proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md Show resolved Hide resolved

krajorama commented Jan 29, 2024

View reviewed changes

proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md Show resolved Hide resolved

krajorama and others added 2 commits January 29, 2024 10:36

Update proposals/2024-01-26_classic-histograms-stored-as-native-histo…

47cac1d

…grams.md Co-authored-by: Ben Kochie <superq@gmail.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>

Minor updates from comments

d539076

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

bwplotka mentioned this pull request Jan 29, 2024

remote write 2.0 - explore histogram atomicity prometheus/prometheus#13410

Closed

bwplotka approved these changes Jan 29, 2024

View reviewed changes

SuperQ reviewed Jan 30, 2024

View reviewed changes

proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md Outdated Show resolved Hide resolved

bwplotka added the proposal label Jan 30, 2024

Update proposals/2024-01-26_classic-histograms-stored-as-native-histo…

9804fce

…grams.md Co-authored-by: Ben Kochie <superq@gmail.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>

bwplotka merged commit 681b64d into prometheus:main Jan 30, 2024
2 checks passed

beorn7 requested changes Jan 30, 2024

View reviewed changes

krajorama added a commit to krajorama/prometheus-proposals that referenced this pull request Jan 31, 2024

Update nathive histogram custom buckets proposal

88d7cc8

Clarifications and updates for late comments to prometheus#31. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

krajorama mentioned this pull request Jan 31, 2024

Update nathive histogram custom buckets proposal #33

Merged

krajorama mentioned this pull request Aug 26, 2024

nhcb: store custom buckets in WAL, WBL prometheus/prometheus#14730

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store and query classic histograms as native histograms with custom buckets #31

Store and query classic histograms as native histograms with custom buckets #31

krajorama commented Jan 26, 2024 •

edited

Loading

bwplotka commented Jan 26, 2024

bwplotka left a comment

bwplotka left a comment

SuperQ left a comment •

edited

Loading

SuperQ Jan 28, 2024

beorn7 Jan 30, 2024 •

edited

Loading

SuperQ commented Jan 30, 2024

beorn7 Jan 30, 2024

beorn7 Jan 30, 2024

beorn7 Jan 30, 2024

beorn7 Jan 30, 2024

beorn7 Jan 30, 2024

beorn7 Jan 30, 2024

beorn7 Jan 30, 2024

beorn7 commented Jan 30, 2024


		Enhance the internal representation of histograms (both float and [integer](https://github.com/prometheus/prometheus/blob/main/model/histogram/histogram.go)) with a nil-able slice of custom bucket definitions. No need to change spans/deltas/values slices.

		The counters for the custom buckets should be stored as integer values if possible. To be compatible with existing precision of the classic histogram representation within a to be defined 𝜎. The GO statement `x == math.Trunc(x)` has an error of around `1e-16` - experimentally.


		Remote write protocol

		The `message.Histogram` type to be expanded with nullable repeated custom_buckets field that list the custom bucket definitions (except `+Inf`, which is implicit). There should be a comment which specifies which schema number means that we need to even look at this field. It should be a validation error to find this field null if the custom bucket schema number is used.

	Answer: NO. Classic histograms typically have less buckets than exponential native histograms which should offset any additional information encoded in the chunk.
	Answer: NO. Classic histograms typically have fewer buckets than exponential native histograms, which should offset any additional information encoded in the chunk.

Store and query classic histograms as native histograms with custom buckets #31

Store and query classic histograms as native histograms with custom buckets #31

Conversation

krajorama commented Jan 26, 2024 • edited Loading

bwplotka commented Jan 26, 2024

bwplotka left a comment

Choose a reason for hiding this comment

bwplotka left a comment

Choose a reason for hiding this comment

SuperQ left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beorn7 Jan 30, 2024 • edited Loading

Choose a reason for hiding this comment

SuperQ commented Jan 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beorn7 commented Jan 30, 2024

krajorama commented Jan 26, 2024 •

edited

Loading

SuperQ left a comment •

edited

Loading

beorn7 Jan 30, 2024 •

edited

Loading