-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handoff to pending ingester did not complete cleanly #458
Comments
Based on the lack of |
The same thing happened again when kured rebooted the next ingester-hosting node: https://cloud.weave.works/prom/loud-breeze-77/notebook/8c4e88ba-a6d6-481c-be68-ee83b2056004 The precipitous ramp up of chunks on the new ingester implies that the transfer was largely successful, but the sender never concluded that it was. |
The final step of On the receiving side, we are seeing messages like this:
The first message corresponds roughly with the assumed end of the transfer based on the graphs above, and continue every few minutes even up until the time of writing (14:25:00Z). Perhaps for some reason the response cannot be written back, perhaps due to the transfer time exceeding a threshold and a deadline exceeded? |
Goroutine dump of terminating ingester:
|
Corresponds to https://github.com/weaveworks/cortex/blob/1ba9d6d82c0c451835f6b9f8fdeecd12ebda5646/pkg/ingester/ingester_lifecycle.go#L338 - the terminating ingester is blocked waiting for all the flush loops to close. |
Argh - I didn't realise that 105841e wasn't deployed in production. |
Again whilst testing the reboot daemon the following situation arose: a cortex ingester
ingester-3378111688-nyll4
with address10.244.228.234
was drained by the reboot daemon, and a new ingesteringester-3378111688-d5lpl
with address10.244.202.96
scheduled to take its place:From draining ingester:
From new ingester:
We can see the the terminating ingester does attempt to send chunks to the new one, but does not manage to remove itself from the ring and exit cleanly before the 40m shutdown limit is reached and it is killed by k8s, leaving the ring in an unhealthy state.
The text was updated successfully, but these errors were encountered: