-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prevent PANIC when leader is None #450
Conversation
happend in tests with tifs: thread 'tokio-runtime-worker' panicked at /home/uli/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tikv-client-0.3.0/src/pd/cluster.rs:270:47: called `Option::unwrap()` on a `None` value Signed-off-by: Ulrich Hornung <hornunguli@gmx.de>
bcff6ae
to
278324c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The panic seems to happen only when PD return GetMembersResponse
without leader
field. But I think it should not happen as only the PD leader can return a successful response of get members request.
So I think the root cause would be, we actually get a GetMemebersResponse
with error (see GrpcServer.GetMembers
). We need the check the header.error
at the first place.
@@ -91,7 +91,9 @@ pub enum Error { | |||
RegionNotFoundInResponse { region_id: u64 }, | |||
/// No leader is found for the given id. | |||
#[error("Leader of region {} is not found", region_id)] | |||
LeaderNotFound { region_id: u64 }, | |||
LeaderOfRegionNotFound { region_id: u64 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest not to change this error as some existed user codes will be using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no concept of "Leader of cluster", but "region leader" or "PD leader". so it would be a little confused.
At the same time, it seems that it is not very necessary to add a new error enum, unless you want to handle this specified error. I think use InternalError
the same as existed code of PD would be enough.
I will test and report in case I still see it. Thanks for notification. I assume this PR would be obsolete then? Or should I prepare still some parts of it for merging? |
It is obsolete now. Unless it's not the reason of |
Hello,
I'm currently experimenting with TiFS (https://github.com/Hexilee/tifs).
When copying large files, the TiKV cluster that I have locally gets under a constant load for a longer period of time (~1-2 hours).
During these tests I came accross a panic caused by the rust-client. I located the code responsible for this, and implemented a fix.
Please have a look, tell me what you think and what to adapt to let it get into the main branch.
Thanks and best regards,
cre4ture
log entry of the panic: