Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

GC task too eagerly closes chunks #844

Closed
woodsaj opened this issue Feb 7, 2018 · 5 comments
Closed

GC task too eagerly closes chunks #844

woodsaj opened this issue Feb 7, 2018 · 5 comments
Assignees
Milestone

Comments

@woodsaj
Copy link
Member

woodsaj commented Feb 7, 2018

with default config, the GC task closes chunks if they have received no data for between 1-2hours (chunk-max-stale + gc-interval)

However, if a chunk has a chunkspan of say 6hours and you send 1hour of data then dont send anything for 3hours. When you start sending again the data will be rejected as the chunk will have already been closed and flushed to cassandra.

We need to make sure that chunks are not closed until after the chunk window has passed.

@replay
Copy link
Contributor

replay commented Feb 7, 2018

actually this makes me think it would make sense to be able to configure chunk-max-stale per retention and not just globally. but as a quick fix that would be too much effort, so better first just check if the chunk window has passed.

@woodsaj
Copy link
Member Author

woodsaj commented Feb 7, 2018

i dont think so. We need chunks to be persisted before the datapoints become older then the kafka retention. So if the kafka retention is 7.5hours we just need to make sure that the chunk is persisted within 7.5hours of the first datapoint being received.

So rather than having a chunk-max-stale setting, i think we should just have a max-chunk-age setting. where the maximum age allowed is kafka-retention - gc-interval to ensure that we never have unflushed data that is older then kafka-retention.

Looks like this issue is a dup of #614

@woodsaj woodsaj closed this as completed Feb 7, 2018
@woodsaj
Copy link
Member Author

woodsaj commented Mar 14, 2018

re-opening this. #614 covers a much wider scope of problems but doesnt look like they are going to be fixed anytime soon. This specific issue is customer impacting and needs to be fixed ASAP.

@Dieterbe
Copy link
Contributor

Dieterbe commented Mar 14, 2018

it sounds reasonable and correct to only close chunks via GC after max-stale AND after the chunk end has passed. (eg when wall clock > last ts of the chunk).
this may still not solve the issue if the data is sent with a big lag, but this sounds uncommon.
for realtime or semi-realtime, this should work well. maybe we should add another 5minute offset or so to accommodate a slight delay.

@woodsaj if this sounds good to you i'll make the PR

@woodsaj
Copy link
Member Author

woodsaj commented Mar 14, 2018

@Dieterbe yep, lets make that change until #614 is implemented.

Dieterbe added a commit that referenced this issue Mar 15, 2018
fix: GC task too eagerly closes chunks. #844
@Dieterbe Dieterbe added this to the 0.8.2 milestone Dec 14, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants