-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: select cluster controller for certain actions #60
Conversation
1f47ffa
to
2d70046
Compare
2d70046
to
7ac1d80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
src/client/controller.rs
Outdated
match error { | ||
Error::Request(RequestError::Poisoned(_) | RequestError::IO(_)) | ||
| Error::Connection(_) => self.invalidate_cached_controller_broker().await, | ||
Error::ServerError(ProtocolError::LeaderNotAvailable, _) => {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to retry these variants here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're right. Let me prune that list.
)) | ||
})?; | ||
|
||
*current_broker = Some(Arc::clone(&broker)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have the same derp as partition leaders where brokers can be controller but not realise they are? I presume not??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a desync could only lead NotController
(I hope Kafka is checking this before checking the content of the operation). Let's see if we can some weird errors and then we can add another check.
|
||
let topics = client.list_topics().await.unwrap(); | ||
// might take a while to converge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could avoid this by also performing this operation (list topics) on the control I guess, but to some extend you have to deal w/ eventual consistency because all read operations (cluster-wide like "list topics" and partition-wide like "get watermark") can always be answered by followers / non-controllers and why is leader/controller might change between our check / connection creation and the actual read request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #63 so we can think about it.
We must not use some arbitrary broker for adding/removing topics. All brokers can read though (w/ eventual consistency).
1s is quite long, esp. when users expect rather low-latency operations. It's also a bit of a pain for our tests because a single error (like `NotAvailable`) can delay tests by a whole second easily leading to timeouts.
b4e34b7
to
1948b1d
Compare
1948b1d
to
48ced12
Compare
We must not use some arbitrary broker for adding/removing topics.
All brokers can read though (w/ eventual consistency).