-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling Unreachable Brokers #9
Comments
I think the solution here is for the failure of It pushes the errors up the stack a bit at least, so that if one broker is unreachable but that's not one we care about, then we continue to work. Thoughts, @burke ? |
Fixes #9. This ended up being more complicated than I had hoped and touched several different areas. TL;DR is that we now connect to the other brokers in the cluster asynchronously. Errors connecting only show up when somebody tries to use that broker. This is better than the old behaviour since it means that if some brokers in a cluster go down but the topics we care about are still available, we just keep going instead of blowing up for no reason. The complicated part is that simply calling `go broker.Connect()` doesn't do what we want, so I had to write a `broker.AsyncConnect()`. The problem occurs if you've got code like this: go broker.Connect() // do some stuff broker.SendSomeMessage() What can happen is that SendSomeMessage can be run before the Connect() goroutine ever gets scheduled, in which case SendSomeMessage will simply return NotConnected. The desired behaviour is that SendSomeMessage waits for Connect() to finish, which means Connect() has to *synchronously* take the broker lock before it launches the asynchronous connect call. Lots of fun. And bonus change in this commit: rather than special-casing leader == -1 in `client.cachedLeader` and adding a big long comment to the LEADER_NOT_AVAILABLE case explaining the fallthrough statement, just delete that partition from the hash. So much easier to follow, I must have been on crack when I wrote the old way.
Spinning this off from #7 to track separately.
When the Client receives metadata from a broker it runs that metadata through
Client.update()
, which tries to connect to any new brokers listed in the metadata. If any of those connections fail, it bails immediately which isn't the right choice. It also potentially leaks broker connections.The text was updated successfully, but these errors were encountered: