Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store and query classic histograms as native histograms with custom buckets #31

Merged
merged 9 commits into from
Jan 30, 2024

Conversation

krajorama
Copy link
Member

@krajorama krajorama commented Jan 26, 2024

To support RW Self-Contained Histograms which is about the need to make writing histograms atomic, in particular to avoid a situation when series of a classic histogram are partially written to (remote) storage. For more information consult the referenced design document.

To make storing classic histograms more efficient by taking advantage of the design of native histograms.

Finally, fully custom bucket layouts is a larger project with wider scope. By reducing the scope we can have a shorter development cycle and offer a good feature and savings sooner.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@bwplotka
Copy link
Member

Nice! Will look in ~1h, thanks! 💪🏽

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM, some nits only.

krajorama and others added 3 commits January 26, 2024 12:04
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
Copy over options 1,2,3 from original.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go from my side

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I see the desire to provide scrape config level flexibility. I think from an operator perspective I would prefer this be a global, per-Prometheus, option. I would not want inconsistent query or data behavior on the same instance of Prometheus. Some jobs emitting classic histograms, some jobs emitting native histograms. It's hard enough already with the existing classic histograms.

## Goals

* No change to instrumentation.
* No change to the query side. *Might not be achieved in the first iteration/ever. The ingestion and storage part can be fully implemented without any changes to the query part. A compatibility layer for querying can be introduced later as needed.*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost feel like this should be a non-goal. If we're going to convert classic to native, I feel like users that opt-in to such a feature would want all the normal behaviors of native histograms.

Copy link
Member

@beorn7 beorn7 Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's introduce a 3rd section (next to Goals and Non-goals): Maybe-goals. :)

krajorama and others added 2 commits January 29, 2024 10:36
…grams.md

Co-authored-by: Ben Kochie <superq@gmail.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
…grams.md

Co-authored-by: Ben Kochie <superq@gmail.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
@bwplotka bwplotka merged commit 681b64d into prometheus:main Jan 30, 2024
2 checks passed
@SuperQ
Copy link
Member

SuperQ commented Jan 30, 2024

Nice


To make storing classic histograms more efficient by taking advantage of the design of native histograms.

Finally, fully custom bucket layouts is a larger project with wider scope. By reducing the scope we can have a shorter development cycle and offer a good feature and savings sooner.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so actually this implements the "hardest" case where the boundaries are fully specified explicitly. It is hard, because it requires additional sub-sections in the existing native histogram data structures.

The "easy" ones are new schemas that have a simple rule to calculate bucket boundaries. For them, we just need to pick a new schema number in between -128 and +127 that isn't taken yet and add the code to calculate the bucket boundaries (and maybe how to handle arithmetic with mixed schema histograms). For example, "log-linear with 90 buckets per power of 10" could be schema 90. Then we just need some convention how to translate a bucket index number into a particular bucket in this log-linear schema.

Purely linear is a bit harder because we probably want an arbitrary absolute increase from bucket to bucket, but we could recycle the bucket boundary data structure from this design doc.

In other words, I think if we implement the custom buckets described here carefully, it will be very easy to add other schemas like the examples above.

And I'm writing all of this here because

  1. I want you to apply that care
  2. it makes this whole effort even better because it unlocks the other bucketing schemas (that I have been asked for already at conferences).

## Non-Goals

* New instrumentation for defining the custom buckets.
* Interoperability with (exponential bucket) native histograms.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be a "maybe-goal", too.

I.e. in the (unlikely) case that a bucketing schema from a classic histogram is convertible into something that can be merged with a exponential native histogram, the PromQL engine detect that and do the merge.


Enhance the internal representation of histograms (both float and [integer](https://github.com/prometheus/prometheus/blob/main/model/histogram/histogram.go)) with a nil-able slice of custom bucket definitions. No need to change spans/deltas/values slices.

The counters for the custom buckets should be stored as integer values if possible. To be compatible with existing precision of the classic histogram representation within a to be defined 𝜎. The GO statement `x == math.Trunc(x)` has an error of around `1e-16` - experimentally.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is that really required?

For one, exposition formats make it quite clear if a histogram is integer or float. It is explicit in protobuf. OpenMetrics text format currently doesn't even support float histograms. And assuming it does so in the future, the floats are explicitly marked by the presence of dots.

Classic Prometheus text format isn't explicit about it, but we could still say we ingest as float histogram, if at least one bucket or the count has a dot and has non-zero digits after the dot.

There is the edge case of explicitly marked float histograms that are effectively integer histograms because all contained floats have only zeros after the floating point. Currently, we store those histograms as float histograms if they are native histograms, and that's probably fine as it is rare. But we do have the plan to optimize that case and opportunistically convert them to an integer histogram. We would only do so if every involved float is effectively an integer.

The way IEEE 754 floats work, this is always clear. We don't need any rounding. If you convert an integer to a float, then math.Trunc(x) == x, even if you are beyond the range where float64 represents each integer precisely (±2^53).

This is also how we did it in Prometheus v1 (which opportunistically stored the simple float64 samples as integers if they were effectively (and precisely) integers).

### Scrape configuration

1. The feature is disabled if the feature flag [`native-histograms`](https://prometheus.io/docs/prometheus/latest/feature_flags/#native-histograms) is disabled.
2. If native histograms feature is enabled, custom histograms can be enabled in `scrape_config` by setting the configration option `convert_classic_histograms` to `true`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address @SuperQ 's comment: I think there should definitely be an option per-scrape-config, but we could also provide a global option for those that want the feature always.


Remote write protocol

The `message.Histogram` type to be expanded with nullable repeated custom_buckets field that list the custom bucket definitions (except `+Inf`, which is implicit). There should be a comment which specifies which schema number means that we need to even look at this field. It should be a validation error to find this field null if the custom bucket schema number is used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be noted that we don't have to specify that the last boundary is +Inf (as that's always the case) but we should store a count for the final +Inf bucket. In current classic histograms, that value is always equal to the count value, but in native histogram, we specify that observations of NaN increase the count, but not any bucket.

Once the custom buckets are implemented, we can in principle use them as pure native histograms with custom buckets on the instrumentation side, and then we would also get this behavior.

(In different news, I think it is a design flaw of classic histograms that they effectively count NaN observations in the +Inf bucket. On the other hand, it doesn't really matter in practices, but now that it is cleanly implemented in native histograms, we should keep that behavior throughout.)

* Would we ever want to store the old representation and the new one at the same time?
*Answer:* YES. Already should work via the `existing scrape_classic_histograms` option.
* What to do in queries if custom histogram and exponential histogram meet or customer histogram and float sample?
*Answer:* same as today with float vs native histogram, that is calculate the result if it makes mathematical sense. For example multiplying a custom histogram with the number 2.0 makes sense. In case of histograms they need to rescaled to match their schema.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"rescaled" is confusing (as we also use it if we multiply a histogram with a float). Maybe "change resolution" or something. It should also be added that this might not work precisely, and that in that case we rather give a warning and don't do the calculation instead of doing something wonky.

Examples (not necessarily to add here, just to create understanding for the reviewers):

  • Custom histogram with boundaries 1, 2, 4, 8 and no sample in the +Inf bucket → can be merged with any exponential histogram.
  • Anything in the +Inf bucket: Doesn't work with exponential histogram.
  • Custom histogram with boundaries 1, 5, 10, 20, +Inf added to another custom histogram with 1, 10, 20, +Inf: We can do the math by first merging the 5-bucket and the 10-bucket together in the first histogram and then add both, resulting in a 1, 10, 20 histogram.

* What to do in queries if custom histogram and exponential histogram meet or customer histogram and float sample?
*Answer:* same as today with float vs native histogram, that is calculate the result if it makes mathematical sense. For example multiplying a custom histogram with the number 2.0 makes sense. In case of histograms they need to rescaled to match their schema.
* Should we use a bigger chunk size for such custom histograms? To offset that we’d want to store the bucket layout in the chunk header. ~4K?
*Answer:* NO. Classic histograms typically have less buckets than exponential native histograms which should offset any additional information encoded in the chunk.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*Answer:* NO. Classic histograms typically have less buckets than exponential native histograms which should offset any additional information encoded in the chunk.
*Answer:* NO. Classic histograms typically have fewer buckets than exponential native histograms, which should offset any additional information encoded in the chunk.

@beorn7
Copy link
Member

beorn7 commented Jan 30, 2024

This was merged during my review.

I recommend to still read my thoughtful comments and apply them as needed. 🙏

krajorama added a commit to krajorama/prometheus-proposals that referenced this pull request Jan 31, 2024
Clarifications and updates for late comments to prometheus#31.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants