Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query filtering in the ingester and storage #5629

Conversation

MasslessParticle
Copy link
Contributor

@MasslessParticle MasslessParticle commented Mar 15, 2022

This PR introduces the concept of a FilteringPipeline/FilteringSampleExtractor that take 0 or more PipelineFilters. PipelineFilters are just log pipelines that run before the Pipeline/Extractor created by the query. Any PipelineFilter that returns true is omitted. The FilteringPipeline is then passed to the underlying iterators like any other pipeline.

Another notable change is that this PR changed the Pipeline interface to also accept timeranges. This is to accommodate the deletes that only partially affect the queried timerange.

The original POC for this work had ChunkFilters but it turns out they're not needed because a delete filter needs to match on labels anyway.

There's a potential future improvement where we use a chunk filter when chunks match a delete's labels and are totally within its timerange.

This PR will change Querying behavior in installations with current delete requests so we should wait until
#5481 is merged because it contains a feature flag for delete behavior

Checklist

  • Documentation added
  • Tests updated
  • Add an entry in the CHANGELOG.md about the changes.

@MasslessParticle MasslessParticle force-pushed the tpatterson/storage-ingester-delete-filter branch from ee755c7 to b66f4af Compare April 6, 2022 15:19
@MasslessParticle MasslessParticle marked this pull request as ready for review April 6, 2022 15:21
@MasslessParticle MasslessParticle requested a review from a team as a code owner April 6, 2022 15:21
Copy link
Contributor

@MichelHollands MichelHollands left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A few nit comments were added.

pkg/logql/log/metrics_extraction_test.go Show resolved Hide resolved
pkg/logql/log/pipeline.go Outdated Show resolved Hide resolved
pkg/logql/log/pipeline.go Outdated Show resolved Hide resolved
pkg/logql/log/pipeline_test.go Outdated Show resolved Hide resolved
pkg/logql/log/pipeline_test.go Show resolved Hide resolved
Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some minor issues but overall the code looks great to me.
Please let me know if you need any more inputs on any of my comments.

pkg/ingester/instance.go Outdated Show resolved Hide resolved
pkg/logql/log/metrics_extraction_test.go Outdated Show resolved Hide resolved
pkg/logql/log/metrics_extraction_test.go Show resolved Hide resolved
pkg/logql/log/metrics_extraction_test.go Show resolved Hide resolved
pkg/logql/log/pipeline.go Outdated Show resolved Hide resolved
pkg/logql/log/pipeline_test.go Outdated Show resolved Hide resolved
pkg/logql/log/pipeline_test.go Show resolved Hide resolved
pkg/logql/log/pipeline.go Outdated Show resolved Hide resolved
pkg/storage/store.go Outdated Show resolved Hide resolved
@MasslessParticle
Copy link
Contributor Author

MasslessParticle commented Apr 8, 2022

I've addressed nearly all the feedback. I still need some help from @sandeepsukhani to understand the filtering. I've added some tests cases to the filtering and everything behaves as I expect so I'm not sure what I'm missing.

The tests are as follows:

2 filters:

  • filter 1: {foo="bar", bar="baz"} | = e
  • filter 2: {baz="foo"} |= e

Cases (not including timerange stuff):

  • Stream is {foo="bar"}: ok = true because no filter matches this stream
  • Stream is {beep="boop"}: ok = true because no filter matches this stream
  • Stream is {foo="bar", bar="baz"}: ok = false because filter 1 matches
  • Stream is {baz="foo"}: ok = true because filter 2 matches

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good let's make sure the allMatch is good.

@MasslessParticle
Copy link
Contributor Author

All match is good and I've incorporated @sandeepsukhani 's feedback

Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested some very minor nits but other than that it LGTM! Nice work!

pkg/logql/log/metrics_extraction_test.go Outdated Show resolved Hide resolved
pkg/logql/log/pipeline.go Outdated Show resolved Hide resolved
pkg/logql/log/pipeline.go Outdated Show resolved Hide resolved
pkg/logql/log/pipeline.go Outdated Show resolved Hide resolved
@simonswine simonswine mentioned this pull request Apr 14, 2022
3 tasks
@sandeepsukhani sandeepsukhani merged commit 5939d5e into grafana:main Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants