Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(blooms): Match series to newest block only #15481

Merged
merged 5 commits into from
Dec 19, 2024

Conversation

chaudum
Copy link
Contributor

@chaudum chaudum commented Dec 18, 2024

What this PR does / why we need it:

While running bloom filters in production we noticed some Loki clusters that showed a very high percentage of missing chunks when querying blooms, thus resulting in lower filter rate.

screenshot_20241219_081701
Screenshot from Bloom Gateway dashboard

The reason is that old, superseded blocks are still considered up-to-date, because they cover a keyspace that is not covered by newer blocks with smaller keyspaces (due to larger individual series).

            | series fingerprint keyspace
------------+----------------------------------------------------------------
            |    o     o   o  o  o     o    o o    o       o    o   o    o
------------+----------------------------------------------------------------
iteration 1 |    111111111111111111111111111111111111111111111111111111111
iteration 2 |    22222222222222             3333333333333333        444444
iteration 3 |    5555555   6666666     77777777    888888888    9999999999
...
up-to-date  |    555555522266666661111177777777333388888888811119999999999
------------+----------------------------------------------------------------
            |          x

The chart shows the different blocks marked with the numbers 1 to 9 for a subset of the full series fingerprint keyspace. The blocks are generated in multiple successive bloom building iterations. The first block covers a larger keyspace (more series), because the individual blooms in the blocks are smaller in the beginning of the day. Later, the blooms get larger and therefore the block fingerprint ranges gets smaller. However, since we are dealing with fingerprint ranges, not individual fingerprints, the newer blocks cause "gaps" in the range of the previously larger keyspace. In the case above, every block except block 4, are considered up-to-date, since each of them covers a keyspace that is otherwise not covered.

When resolving blocks for a series at query time, we consider looking at all up-to-date blocks, which are referenced by the meta files. The series x in the chart shows, that it is within the range of 3 up-to-date blocks: 1, 2, 5. However, only the newest block (5) may contain the requested series.

This PR changes the block resolver on the index-gateway to only match the newest block to a series, based on the timestamp of the TSDB from with the blocks were generated.

based on meta's TSDB timestamp

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Comment on lines +112 to +117
for i := range res {
t.Logf("%s", res[i].block)
for j := range res[i].series {
t.Logf(" %016x", res[i].series[j].Fingerprint)
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mainly for debugging

@chaudum chaudum marked this pull request as ready for review December 19, 2024 08:04
@chaudum chaudum requested a review from a team as a code owner December 19, 2024 08:04
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
The the slice of sources must be the same length as the slice of blocks.
A source at index i is the TSDB identifier from which the block at index i was built.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
@chaudum chaudum merged commit 5846ea2 into main Dec 19, 2024
58 checks passed
@chaudum chaudum deleted the chaudum/bloomfix-match-series-to-newest-block branch December 19, 2024 19:48
mveitas pushed a commit to mveitas/loki that referenced this pull request Jan 6, 2025
**What this PR does / why we need it**:

While running bloom filters in production we noticed some Loki clusters that showed a very high percentage of missing chunks when querying blooms, thus resulting in lower filter rate.

The reason is that old, superseded blocks are still considered up-to-date, because they cover a keyspace that is not covered by newer blocks with smaller keyspaces (due to larger individual series).

```
            | series fingerprint keyspace
------------+----------------------------------------------------------------
            |    o     o   o  o  o     o    o o    o       o    o   o    o
------------+----------------------------------------------------------------
iteration 1 |    111111111111111111111111111111111111111111111111111111111
iteration 2 |    22222222222222             3333333333333333        444444
iteration 3 |    5555555   6666666     77777777    888888888    9999999999
...
up-to-date  |    555555522266666661111177777777333388888888811119999999999
------------+----------------------------------------------------------------
            |          x
```

The chart shows the different blocks marked with the numbers 1 to 9 for a subset of the full series fingerprint keyspace. The blocks are generated in multiple successive bloom building iterations. The first block covers a larger keyspace (more series), because the individual blooms in the blocks are smaller in the beginning of the day. Later, the blooms get larger and therefore the block fingerprint ranges gets smaller. However, since we are dealing with fingerprint ranges, not individual fingerprints, the newer blocks cause "gaps" in the range of the previously larger keyspace. In the case above, every block except block 4, are considered up-to-date, since each of them covers a keyspace that is otherwise not covered.

When resolving blocks for a series at query time, we consider looking at all up-to-date blocks, which are referenced by the meta files. The series `x` in the chart shows, that it is within the range of 3 up-to-date blocks: 1, 2, 5. However, only the newest block (5) may contain the requested series.

This PR changes the block resolver on the index-gateway to only match the newest block to a series, based on the timestamp of the TSDB from with the blocks were generated.

---
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants