Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

histograms: Support exemplars in pure Native Histograms #1126

Closed
beorn7 opened this issue Sep 6, 2022 · 11 comments · Fixed by #1471
Closed

histograms: Support exemplars in pure Native Histograms #1126

beorn7 opened this issue Sep 6, 2022 · 11 comments · Fixed by #1471
Assignees

Comments

@beorn7
Copy link
Member

beorn7 commented Sep 6, 2022

This depends on prometheus/client_model#61 .

Exemplars can be added to the buckets of conventional histograms but not to native histograms. So far, the way of sending exemplars is by also rendering a conventional histogram. With the issue above fixed, we have a way of sending exemplars independently of conventional buckets. However, without conventional buckets, we don't have a heuristics which exemplars to pick. Native histograms can have so many buckets that adding an examplar to each is infeasible.

This is P2 as explained in the milestone's description.

@beorn7 beorn7 added this to the Native Histograms milestone Sep 6, 2022
@krajorama
Copy link
Member

Idea: keep a schema for the exemplars. At the beginning the exemplar schema would be the same as the buckets schema.

The exemplar schema "buckets" will cover the same range of values as the normal buckets, but at a much lower resolution. We could keep a maximum N exemplars per exemplar schema bucket (M). So max MxN. We'd automatically get selection of sampled exemplars per logarithmic range.

@beorn7
Copy link
Member Author

beorn7 commented Nov 9, 2023

Note that exemplars directly added to native histograms MUST have a timestamp, see prometheus/client_model#61 and prometheus/prometheus#13021 for details.

@fatsheep9146
Copy link
Contributor

Since prometheus/client_model#61 is solved, I would love to take this issue. If ok, you can assign to me @beorn7 @carrieedwards

@beorn7
Copy link
Member Author

beorn7 commented Jan 24, 2024

Thank you @fatsheep9146 . I have assigned this issue to you. Feel free to discuss selection heuristics on Slack, if you need inspiration beyond what has been discussed above.

@carrieedwards carrieedwards moved this from Todo to Assigned in Prometheus 3.0 Jan 24, 2024
@fatsheep9146
Copy link
Contributor

Before I support exemplars in pure native histogram in client golang, should we release the client_model with a new tag v0.6.0 then update the dependency in client golang to use the new client model? @beorn7

@beorn7
Copy link
Member Author

beorn7 commented Feb 16, 2024

True. I'll add the new tag ASAP (hopefully later today).

@fatsheep9146
Copy link
Contributor

The dependency in prometheus server should also be updated, I will also do this after the new tag is ready. @beorn7

@beorn7
Copy link
Member Author

beorn7 commented Feb 16, 2024

The tag is in place! https://github.com/prometheus/client_model/releases/tag/v0.6.0

@beorn7
Copy link
Member Author

beorn7 commented Feb 21, 2024

Broad idea for the heuristics:

  • Let the user configure a number of exemplars n to expose per histogram (default could be around 10).
  • The first n ObserveWithExemplar calls create an exemplar that we keep for now.
  • Whenever an ObserveWithExemplar call happens after that, first add the new exemplar to the current set of exemplars, and then delete one according to the following priority list:
    1. The oldest exemplar that is older than a certain threshold (maybe 5m, could also be configurable).
    2. Find the two exemplars that are closest to each other on an exponential scale. Of those, delete the older one.
  • A reset of the histogram should also delete all exemplars (so that we never expose an exemplar that doesn't fall into any of the populated buckets).

This should lead to an even spread of exemplars, while not keeping any individual exemplar for overly long.

@fatsheep9146
Copy link
Contributor

"Find the two exemplars that are closest to each other on an exponential scale."
How should I understand this sentence?
Does it mean that I should first take the logarithm of the values in all exemplars, and compare logarithm of newly added exemplar to all of existing exemplars, found the exemplar which has the closest logarithm with newly added exemplar?

For example, there exists two exemplars which has value 4, 6, the histogram schema is 0, the max count of exemplar is 2.
When we add a new exemplar which value is 5, we should compare the diff of logarithm with existing exemplar
log2(5) - log2(4) = 0.321
log2(6) - log2(5) = 0.263

So we should drop 6, keep 4, 5. Do I understand this correctly?

@beorn7

@beorn7
Copy link
Member Author

beorn7 commented Feb 23, 2024

Yes, sounds about right.

A naive implementation of this could be quite expensive, though, especially if many exemplars are to be kept around. However, for one I assume the number of exemplars will be generally small (around 10?). Also, you can probably play many tricks, like the following:

  • Keep the logarithm of each exemplar around.
  • Maintain a list of exemplars sorted by value (so finding the two closest exemplars, it's O(n), not O(n²)).
  • Maintain a list of exemplars sorted by time (so it's easy to find the oldest exemplar).

@carrieedwards carrieedwards moved this from Assigned to In Progress in Prometheus 3.0 Mar 19, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Prometheus 3.0 May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants