Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize away label=~".*" matchers in TSDB layer. #6996

Closed
wants to merge 7 commits into from
Closed

Optimize away label=~".*" matchers in TSDB layer. #6996

wants to merge 7 commits into from

Conversation

pstibrany
Copy link
Contributor

@pstibrany pstibrany commented Mar 17, 2020

This is alternative to #6995, except it removes these useless matchers in TSDB layer.

Benchmarks in progress, will update the post when finished.

Update: well, first benchmarks show difference where you'd expect -- when using multiple matchers, one of them being ~= ".*". I need to let them run longer, and they block my computer, so I'll do that later.

@roidelapluie
Copy link
Member

Can we also tackle the case !~".*" ?

@pstibrany
Copy link
Contributor Author

Can we also tackle the case !~".*" ?

By returning empty set, and thus postings? Yes, I think we can.

@pstibrany
Copy link
Contributor Author

pstibrany commented Mar 17, 2020

I did not add a new test for label!~".*", since there is already a test that triggered this branch here:

matchers: []*labels.Matcher{labels.MustNewMatcher(labels.MatchEqual, "n", "1"), labels.MustNewMatcher(labels.MatchNotRegexp, "i", "^.*$")},
exp: []labels.Labels{},
},

tsdb/querier.go Outdated Show resolved Hide resolved
tsdb/querier.go Outdated
if m.Type == labels.MatchRegexp && (m.Value == ".*" || m.Value == "^.*$") && len(ms) > 1 {
// Ignore this matcher completely. This matches any value, including no value, so it's a no-op.
// It's safe to ignore, because there must be some matcher matching non-empty string.
// Some tests only use single label=~".*" matcher, and for those, we include the length condition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just about tests, it's about correctness. Can we fast path those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I've modified the code to use all postings if we find this kind of matcher. We only return all postings, if there are no other positive matchers.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my side.

NOTE: The same thing would be nice to implement in Thanos - we don't use PostingForMatchers.

We do it here: https://github.com/thanos-io/thanos/blob/2dc375b7a64c5f0d7da488b44664fef1d4865871/pkg/store/bucket.go#L1334

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still doesn't look right. Can you add to the tests we have for the other regex fast path?

tsdb/querier.go Outdated Show resolved Hide resolved
Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. And as Brian said, some test cases for fast path and cases around that would be good.

@brian-brazil
Copy link
Contributor

We've had regressions before for things like this, so we really need a test.

@pstibrany
Copy link
Contributor Author

I've added more test cases covering the shortcuts, and verified that they work with old version of PostingsForMatchers, as well as with new shortcuts.

We've had regressions before for things like this, so we really need a test.

I'm not quite sure how to approach this test. Any suggestions?

@brian-brazil
Copy link
Contributor

See #6540

@pstibrany
Copy link
Contributor Author

pstibrany commented Mar 18, 2020

See #6540

Do you suggest to move these shortcuts into a separate function to make it more testable? I'd prefer to avoid doing that, as it would complicate the code. Given that, I'm out of ideas how to test it in a way that you're suggesting.

@brian-brazil
Copy link
Contributor

That's how we're doing existing tests of this sort of thing, so I hoped it'd provide some inspiration. I mainly care that it is tested, so any reasonable approach.

@pstibrany
Copy link
Contributor Author

pstibrany commented Apr 1, 2020

That's how we're doing existing tests of this sort of thing, so I hoped it'd provide some inspiration. I mainly care that it is tested, so any reasonable approach.

I think I finally have a good way to test these shortcuts -- I simply check if returned posting is empty or "all", or expected label postings match. PTAL. (TestPostingsForMatchersShortcuts)

@pstibrany
Copy link
Contributor Author

Tests are failing in react-app-test step, seems unrelated to my changes. (Perhaps I shouldn't have rebased on top of master)

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
@pstibrany
Copy link
Contributor Author

Is there more I should do here, or does the test look good now?

@brian-brazil
Copy link
Contributor

Thinking on this there, this optimisation isn't correct at all as . doesn't match newline by default in Go.

@pstibrany
Copy link
Contributor Author

Thinking on this there, this optimisation isn't correct at all as . doesn't match newline by default in Go.

That's a valid point. Since it is not very typical to have newlines in label values, and most people mean to match everything when they write .*, would it make sense to make a breaking change and let . match newline too? (By using "^(?s:" + v + ")$" for anchoring)

But I assume that such discussion has already happened in Prometheus before, and the answer is No.

@beorn7
Copy link
Member

beorn7 commented Apr 20, 2020

It's even remotely plausible that somebody uses foo=~".*" to filter out labels with newlines in it. 😢

@brian-brazil
Copy link
Contributor

That would be a breaking change to PromQL, and thus not something that's doable in the current major version of Prometheus. We'd also have to change how regexes across the other repos, and for the SNMP exporter at least newlines in strings have come up.

@pstibrany
Copy link
Contributor Author

You're right. 😞

@pstibrany pstibrany closed this Apr 20, 2020
@pstibrany
Copy link
Contributor Author

Thanks for your review and feedback!

@roidelapluie
Copy link
Member

Thinking on this there, this optimisation isn't correct at all as . doesn't match newline by default in Go.

Should we have unit tests around this?

@roidelapluie
Copy link
Member

Note: We have already got requests from users which use new lines in labels, so that is a thing.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants