-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/lifecycler/stuck in leaving #1921
Fix/lifecycler/stuck in leaving #1921
Conversation
fc72395
to
44e4d7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you able to find out why the ingester was stuck in LEAVING
state? I found myself in the same situation when working with the WAL and it turned out that I had termination grace period of Kubernetes set very low (30sec). Hence before the ingester could exit gracefully, it was killed, leaving the ring in LEAVING
state.
Also, this takes care of this issue #1904 (created when I faced the exact same issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the problem you're trying to fix. The change sounds reasonable to me, but I'm not familiar enough with all the ring's edge cases. @pstibrany what's your take on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this fixes the problem when ingesters reuse their names (eg. statefulsets).
44e4d7e
to
a4d5af3
Compare
Signed-off-by: Thor <thansen@digitalocean.com>
Signed-off-by: Thor <thansen@digitalocean.com>
2c77f3c
to
4a1d45f
Compare
This will certainly help the statefulset ingester setup when recovering from an unclean shutdown. Thank you! |
If ingester in deploy use k8s deployment, every time ingester recreate by k8s, it will register itself with different hostname and address. the zombie ingester will stay in ring for ever, and finally the clients try to read/write will report If the ingester is kill by oom-killer, it won't start the quit process, and will leave state in By the way , after the success transfer or flush, ingester will sleep for the same problem happens on ruler too. May be we need a away to cleanup dead ingester from ring. |
@canghai118 I'm not sure there's a safe way to determine if an ingester is a "zombie" or if it's just crash looping and will come back eventually. I'd worry that trying to perform cleanups on what we believe to be a zombie ingester might cause more problems than it solves. |
What this PR does: I've observed ingesters that were evicted and then rescheduled. They were unable to gracefully shutdown and left their state in the ring as LEAVING. On reschedule the ingester found itself in the LEAVING state. Set itself to the LEAVING state and was then unable to join the cluster.
This allows an ingester that finds itself in the LEAVING state with tokens to mark itself as ACTIVE.
Which issue(s) this PR fixes:
N/A
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]