-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mysterious flush of underutilised chunks 1hr after ingester rollout #467
Comments
Do you know if these are spurious? Or did 800k/3 timeseries stop being
written to?
…On Mon, Jun 19, 2017 at 5:34 PM Adam Harrison ***@***.***> wrote:
On the 15th of June:
[image: screenshot from 2017-06-19 17-23-20]
<https://user-images.githubusercontent.com/1504438/27295498-13ea51d0-5515-11e7-8d8c-2de8b93238b2.png>
On the 19th of June:
[image: screenshot from 2017-06-19 17-23-33]
<https://user-images.githubusercontent.com/1504438/27295513-1ff314b2-5515-11e7-8fcd-2e25cbcdf091.png>
Both of these happened approximately one hour after an ingester upgrades
in which chunks were successfully transferred from terminating ingesters.
Chunk max idle is 1h, possibly related?
CC @tomwilkie <https://github.com/tomwilkie> any thoughts?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#467>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAbGhRfyCvo82TNSkCcmdJGS3qxnMfk2ks5sFqMvgaJpZM4N-e40>
.
|
Oh - could those be series from metrics scraped from the ingesters which have now terminated? |
Querying for |
I assume this was part of a rolling upgrade of all of cortex? What does |
It's almost like the state reconstructed from transferred chunks in the new ingesters is not identical to that in the old, and so they are making different decisions about what should be flushed... |
This is also happening on rolling reboots. |
As @awh said, we just had a single ingester replaced, due to a node reboot, and the queue peaked at 600K |
Oops |
I had a brainwave last night; during ingester handover the old and new ingesters are no longer in |
To prevent the ingester picker from picking a temporary node (and allow N concurrent upgrades), we'd need to ensure |
Good catch!
We can work round this by including leaving ingesters in the N, but not
writing to them, as you suggest. Should be an easy fix.
…On Thu, 3 Aug 2017 at 12:21, Marcus Cobden ***@***.***> wrote:
To prevent the ingester picker from picking a temporary node (and allow N
concurrent upgrades), we'd need to ensure replication - quorum - N > 1,
and allow the picker to select fewer nodes during a handover (quorum < X
< replication).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#467 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAbGhfUg4ZQygo1EX_9poQbrKIZNs3Juks5sUa1WgaJpZM4N-e40>
.
|
Aye, good spot 👏 |
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
On the 15th of June:
![screenshot from 2017-06-19 17-23-20](https://user-images.githubusercontent.com/1504438/27295498-13ea51d0-5515-11e7-8d8c-2de8b93238b2.png)
On the 19th of June:
![screenshot from 2017-06-19 17-23-33](https://user-images.githubusercontent.com/1504438/27295513-1ff314b2-5515-11e7-8fcd-2e25cbcdf091.png)
Both of these happened approximately one hour after an ingester upgrade in which chunks were successfully transferred from terminating ingesters. Chunk max idle is 1h, possibly related?
CC @tomwilkie any thoughts?
The text was updated successfully, but these errors were encountered: