Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unordered chunks #4001

Merged
merged 22 commits into from
Jul 27, 2021
Merged

Unordered chunks #4001

merged 22 commits into from
Jul 27, 2021

Conversation

owen-d
Copy link
Member

@owen-d owen-d commented Jul 15, 2021

This PR:

  1. Extends unordered<>ordered headblock interoperability to the MemChunks.
  2. Reorders chunk data before flushing to storage. This ensures the read path post-ingestion can always take advantage of ordered optimizations.
  3. Adds more testware & benchmarks.

Dependent on: #3995

ref: #1544

@owen-d owen-d requested a review from cyriltovena July 15, 2021 13:58
@owen-d owen-d marked this pull request as draft July 16, 2021 17:56
@owen-d owen-d mentioned this pull request Jul 19, 2021
@owen-d owen-d marked this pull request as ready for review July 21, 2021 11:56
@owen-d owen-d requested a review from a team as a code owner July 21, 2021 11:56
if ordered {
it = iter.NewNonOverlappingSampleIterator(its, "")
} else {
it = iter.NewHeapSampleIterator(ctx, its)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need variant of HeapIterator that just reorder samples/logs depending on how that affects performance.

I'm guessing this only happening for chunk in memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the unordered version only happens on the ingesters. Since we reorder chunks before flushing to storage (if needed), the unoptimized heapIter versions are only run on the ingesters themselves and only if the ingester is receiving out of order data.

edit: I should look at refactoring the order detection into a helper fn. These methods are way too complex for my tastes :(

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks fantastic. I have one question.

Since we now allow out of order writes across multiple chunks (new and previous one) I think this means we'll end up with overlapping chunks in the storage.

Does this means we need to modify the function below

https://github.com/grafana/loki/blob/main/pkg/storage/batch.go#L412-L428

which currently seems to assume chunk from the same stream are never overlapping.

@owen-d
Copy link
Member Author

owen-d commented Jul 26, 2021

We shouldn't need to change the buildHeapIterator function because overlap is already explicitly handled in the batching code: https://github.com/grafana/loki/blob/main/pkg/storage/batch.go#L155

Note: this is already accounted for due to replication

@cyriltovena
Copy link
Contributor

cyriltovena commented Jul 27, 2021

We shouldn't need to change the buildHeapIterator function because overlap is already explicitly handled in the batching code: https://github.com/grafana/loki/blob/main/pkg/storage/batch.go#L155

Note: this is already accounted for due to replication

This is handling overlapping across different stream I think. I'll write a test to clear my suspicion.

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

Let's GOOOOOOOOOOO ! 🥳

@owen-d owen-d merged commit 56256bf into grafana:main Jul 27, 2021
slim-bean pushed a commit that referenced this pull request Aug 5, 2021
* merge feature/unordered-replay

* interoperable head chunks

* memchunk block interop

* retain ordered memchunk optimizations when possible

* tests+bench for unordered chunk reads

* reorder on chunk close

* [wip] ingester stream unorderd

* unordered writes default in testware config, fixes OOO bug & removes unused lastChunkTimestamp var

* validity window is 1/2 of max age & fixes old transfer test

* more consistent headblock checking/creation

* more cohesive encoding tests

* unordered stream test with validity bounds

* compat - unordered

* reinstates memchunk defaults when rebounding & updates storage test compatibility

* lint

* reorder across blocks doesnt overflow

* respect chunk configs during rebounding when possible

* only sync checks on ordered writes

(cherry picked from commit 56256bf)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants