Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: select cluster controller for certain actions #60

Merged
merged 5 commits into from
Jan 14, 2022

Conversation

crepererum
Copy link
Collaborator

@crepererum crepererum commented Jan 14, 2022

We must not use some arbitrary broker for adding/removing topics.

All brokers can read though (w/ eventual consistency).

@crepererum crepererum force-pushed the crepererum/controller_handling branch from 1f47ffa to 2d70046 Compare January 14, 2022 11:03
@crepererum crepererum marked this pull request as ready for review January 14, 2022 11:03
@crepererum crepererum mentioned this pull request Jan 14, 2022
@crepererum crepererum force-pushed the crepererum/controller_handling branch from 2d70046 to 7ac1d80 Compare January 14, 2022 11:06
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

match error {
Error::Request(RequestError::Poisoned(_) | RequestError::IO(_))
| Error::Connection(_) => self.invalidate_cached_controller_broker().await,
Error::ServerError(ProtocolError::LeaderNotAvailable, _) => {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to retry these variants here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right. Let me prune that list.

))
})?;

*current_broker = Some(Arc::clone(&broker));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have the same derp as partition leaders where brokers can be controller but not realise they are? I presume not??

Copy link
Collaborator Author

@crepererum crepererum Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a desync could only lead NotController (I hope Kafka is checking this before checking the content of the operation). Let's see if we can some weird errors and then we can add another check.


let topics = client.list_topics().await.unwrap();
// might take a while to converge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😭

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid this by also performing this operation (list topics) on the control I guess, but to some extend you have to deal w/ eventual consistency because all read operations (cluster-wide like "list topics" and partition-wide like "get watermark") can always be answered by followers / non-controllers and why is leader/controller might change between our check / connection creation and the actual read request.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #63 so we can think about it.

We must not use some arbitrary broker for adding/removing topics.

All brokers can read though (w/ eventual consistency).
1s is quite long, esp. when users expect rather low-latency operations.
It's also a bit of a pain for our tests because a single error (like
`NotAvailable`) can delay tests by a whole second easily leading to
timeouts.
@crepererum crepererum force-pushed the crepererum/controller_handling branch from b4e34b7 to 1948b1d Compare January 14, 2022 11:47
@crepererum crepererum force-pushed the crepererum/controller_handling branch from 1948b1d to 48ced12 Compare January 14, 2022 11:54
@crepererum crepererum added the automerge Instruct kodiak to merge the PR label Jan 14, 2022
@kodiakhq kodiakhq bot merged commit 377b1c1 into main Jan 14, 2022
@carols10cents carols10cents deleted the crepererum/controller_handling branch January 21, 2022 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Instruct kodiak to merge the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants