-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initializing Client with Multiple Brokers #7
Comments
Or does it make more sense for just the caller to loop creating clients until one works - then it can decide its own error behaviour... |
I'd probably shoot for something like this: func NewClient(id string, brokers ...string) (*Client, error) {
for _, broker := range(brokers) {
host, portstr := strings.SplitN(broker, ":", 2)
port := strconv.Atoi(portstr)
// the retry logic you described above
}
} So the effective change is that |
This is making me think of another potential issue: if the client cannot connect to all of the listed brokers in Client.update(), it just errors out. This potentially leaks broker connections and isn't necessarily the right thing to do. This is a surprisingly hard problem :/ |
As I mentioned in #4, it probably makes sense for the Client, Producer and Consumer constructors to take some sort of Config structs so we can add parameters in a backwards-compatible way. In that case I'm not sure if |
Additional concern: as per the comment near the end of NewClient, broker metadata requests do not seem required to return all the brokers they know about (it is not documented which ones they do return, but I suspect it is the only the subset that leads a partition whose data is also in the response). This means that even once we connect to a broker and retrieve metadata, we should not discard the other n brokers that the user has provided us with - we may need them later if they were not returned in the metadata and all of the ones that were returned go away. Secondly, if the user gives us two addresses, the first of which is down, we should not block for two minutes (the TCP connection timeout) waiting for the first one before we try the second - we should launch async connections to all of them and use the first that works (or return an error if none of them work). |
Fixes #7 This does basically the simplest thing that works. It has some flaws, but most importantly it gets the API in place and is basically functional. We can improve it later without breaking API compatibility. Remaining sub-optimal behaviour: * If the user provides three addresses, only the last of which is valid, we wait for the first two connections to completely fail (which may take several minutes depending on the TCP timeout) before we try the third. Trying them all in parallel and using the first valid one would be better. * Once we've had an address fail, we discard it forever. Over the lifetime of a long-running cluster, all of the nodes may be down at one point or another, meaning that eventually we may 'run out' of nodes and abort. We should probably just mark the address as recently-failed, and retry them after some time has passed. Only if all nodes have *recently* failed should we abort.
This has come up in the back of my mind a few times but I've always held off because I'm not sure what sane default behaviours are in all cases. Right now you specify a single host:port to construct a client and it pulls metadata on the rest of the cluster from that. It makes sense to be able to specify a list of brokers, so that if part of the cluster is offline the client can still connect.
The text was updated successfully, but these errors were encountered: