Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for matchers in labels API #3658

Closed
pstibrany opened this issue Jan 7, 2021 · 15 comments · Fixed by grafana/mimir#3
Closed

Implement support for matchers in labels API #3658

pstibrany opened this issue Jan 7, 2021 · 15 comments · Fixed by grafana/mimir#3
Labels

Comments

@pstibrany
Copy link
Contributor

Prometheus has recently added support for matchers in labels API (/api/v1/labels and /api/v1/label/<label_name>/values) in prometheus/prometheus#8301.

Cortex needs to do the same to stay compatible.

@pstibrany
Copy link
Contributor Author

Change in Thanos: thanos-io/thanos#3566

@AutonomicMattRoy
Copy link

+1, Would greatly appreciate this functionality

@pstibrany
Copy link
Contributor Author

pstibrany commented Apr 28, 2021

Current status: match[] is supported for Label Values when querying ingesters (#3806) and store (#4133).

match[] is not yet supported by Label Names API.

@colega
Copy link
Contributor

colega commented Jul 12, 2021

The current situation is that the label names API supports matchers by using a Select() call, and bringing all the series without their samples because a "series" hint function is being passed down to the Select() call.

That Select() call goes to the querier's implementation, which in turn is calling the distributorQuerier.Select() (and also the store, if implemented, but I'm skipping that in this comment) which detects the "series" hint and calls MetricsForLabelMatchers() on the Distributor, which is calling that method on all the ingesters.

Issue: that MetricsForLabelMatchers is a one shot request, and if there are too many metrics in the ingesters, they fail to respond because of the gRPC max response size limit: we've seen responses of more than 40mb there.

Ironically, if we do a range_query instead, i.e., we ask for all the previous series plus their samples, the query works because it performs a streaming select instead, which doesn't suffer that response size limitation because the data is being brought in multiple packets.


So, I see two approaches here:

One goes through adding matchers to the LabelNames method of the LabelQuerier interface. I don't have enough experience to clearly say the scope of that change, but I guess that first it would need to be changed in Prometheus, and then in both Thanos and Cortex in order to adopt the change (the cyclic dependencies here and synchronizing the release is what scares me the most).

I also see an easier one, which consists in pushing the hints down to the ingesters. Instead of checking the hints in distributor queryable we can add hints to the Distributor.Query and Distributor.QueryStream signatures, send them to the same methods in the ingester, and then use them for Select() on user's TSDB, that code already seem to support the "series" hint, so we're good there.

@bboreham
Copy link
Contributor

I think we should be sending the matchers down to each ingester then unioning the sets that come back, which will likely be tiny compared to bringing back all the series then stripping the labels out in the querier.
This is in line with your first option, except we are more concerned with the protocol definition in ingester.pb.go than the Go interface in Prometheus.

@colega
Copy link
Contributor

colega commented Jul 12, 2021

The matchers are already being sent to each ingester, and each ingester in the MetricsForLabelMatchers, however it seems that it's just not enough since there are too many values in some of the labels. Do you mean adding the matchers to LabelNames method?

OTOH, adding a Matchers *LabelMatchers field to any of the mentioned options (QueryRequest or LabelNamesRequest) should be a backwards compatible change, what are the concerns there? The release process for this feature? (like queriers being deployed before the ingesters, and ingesters ignoring the matchers on LabelValues call?)

@bboreham
Copy link
Contributor

Do you mean adding the matchers to LabelNames method?

Yes, and calling that from querier.

@bboreham
Copy link
Contributor

To be clearer, we don't want to use the Prometheus implementation https://github.com/prometheus/prometheus/blob/263847e64a0581eba34bd907bd8d1b169be15b95/web/api/v1/api.go#L536 which assumes it's free to pass millions of series around; we want one that knows it's calling ingesters.

@colega
Copy link
Contributor

colega commented Jul 12, 2021

We would still need to change that api implementation to actually call the LabelValues method with matchers, right? (because we're using that API implementation for Cortex API too)

So we need to make the change in Prometheus first, and then upgrade the Prometheus here and support it.

@bboreham
Copy link
Contributor

Well, what I meant is Cortex would replace that implementation, but I haven't looked closely into whether it would be reasonable to change it upstream.

@pracucci
Copy link
Contributor

pracucci commented Jul 14, 2021

Issue: that MetricsForLabelMatchers is a one shot request, and if there are too many metrics in the ingesters, they fail to respond because of the gRPC max response size limit: we've seen responses of more than 40mb there.

Given this issue ☝️ (which is I guess it's what we want to solve) I feel a bit lost from the follow up discussion. Let me done a step back.

The current situation is that the label names API supports matchers by using a Select() call, and bringing all the series without their samples because a "series" hint function is being passed down to the Select() call.

That links to the Prometheus implementation, but Cortex has a different implementation for the "label names" API. The "label names" implementation in Cortex doesn't support matchers yet and the request is propagated down to ingesters where we call TSDB's querier.LabelNames().

MetricsForLabelMatchers() is not involved in the "label names" API in Cortex, while it's used by the "series" API. So, before digging deeper, I would like to better understand which exact problem do we want to solve. Eg.

  • Add matchers support to "label names" API?
  • Implement MetricsForLabelMatchers() in a grpc-streaming fashion to avoid hitting the max frame size?
  • ...?

@pracucci
Copy link
Contributor

My previous comment was wrong. Actually MetricsForLabelMatchers() is used when matchers are provided, while querier.LabelNames() when matchers are no provided.

@pracucci
Copy link
Contributor

We would still need to change that api implementation to actually call the LabelValues method with matchers, right? (because we're using that API implementation for Cortex API too)
So we need to make the change in Prometheus first, and then upgrade the Prometheus here and support it.

I think so and I've seen you've already opened a PR prometheus/prometheus#9083. In my opinion, this is the best solution.

A pure Cortex solution to the max grpc frame size could be implementing MetricsForLabelMatchers() in a grpc-streaming fashion (similar to QueryStream()). However, I believe going through adding matchers to querier.LabelNames() would allow us to implement it more efficiently.

@pstibrany
Copy link
Contributor Author

Still valid.

@pstibrany pstibrany reopened this Jul 23, 2021
colega added a commit to colega/cortex that referenced this issue Jul 26, 2021
Since prometheus/prometheus#9083 prometheus now
provides matchers to the LabelNames method on the LabelQuerier
interface, so in order to upgrade Prometheus we need to support that.

This partially solves
cortexproject#3658 as now the block
store queryable uses the LabelNames method with matchers.

However, we're still using the ingesters' MetricsForLabelMatchers method
to perform the LabelNames w/matchers call on the distributor. That
change will be tackled separately as it breaks ingester's contracts and
needs to be released with a feature flag to perform a backwards
compatible release.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
pracucci pushed a commit that referenced this issue Jul 26, 2021
* Upgrade Prometheus to LabelNames with matchers

Since prometheus/prometheus#9083 prometheus now
provides matchers to the LabelNames method on the LabelQuerier
interface, so in order to upgrade Prometheus we need to support that.

This partially solves
#3658 as now the block
store queryable uses the LabelNames method with matchers.

However, we're still using the ingesters' MetricsForLabelMatchers method
to perform the LabelNames w/matchers call on the distributor. That
change will be tackled separately as it breaks ingester's contracts and
needs to be released with a feature flag to perform a backwards
compatible release.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Updated CHANGELOG.md

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@stale
Copy link

stale bot commented Oct 22, 2021

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 22, 2021
@stale stale bot closed this as completed Nov 6, 2021
alvinlin123 pushed a commit to ac1214/cortex that referenced this issue Jan 14, 2022
* Upgrade Prometheus to LabelNames with matchers

Since prometheus/prometheus#9083 prometheus now
provides matchers to the LabelNames method on the LabelQuerier
interface, so in order to upgrade Prometheus we need to support that.

This partially solves
cortexproject#3658 as now the block
store queryable uses the LabelNames method with matchers.

However, we're still using the ingesters' MetricsForLabelMatchers method
to perform the LabelNames w/matchers call on the distributor. That
change will be tackled separately as it breaks ingester's contracts and
needs to be released with a feature flag to perform a backwards
compatible release.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Updated CHANGELOG.md

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Alvin Lin <alvinlin@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants