-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow KafkaClient to take in a list of brokers for bootstrapping #25
Comments
@mumrah this would be a good thing to do. I am working on Zookeeper support for this library. I was planning to start working on support for multiple brokers. Can you let me know the behavior that you are planning in producer/consumer when multiple brokers are provided. (consumer behavior is pretty much documented for kafka - https://cwiki.apache.org/KAFKA/consumer-group-example.html) |
@mahendra the list of brokers is really just used for bootstrapping the rest of the brokers. Any broker in the cluster can give metadata on the whole cluster - like what topics exist, how many partitions they have, and which brokers are leaders for each partition. The reason for having a list of brokers instead of a single bootstrapper is in the event the single bootstrapper is unavailable. ZooKeeper would provide broker discovery as well and would be a good addition. As a side note, I do not want to add the ZooKeeper consumer group rebalancing to this library. |
@mumrah ok. i get it. about zookeeper - do you mean that you don't want the zookeeper support inside python-kafka? as in keep it a a separate library? or only that you want to keep the consumer group re-balancing out? I was planning on doing it without touching the core python-kafka code. Will discuss the design with you in detail. still working on it. |
I'm not opposing to bringing in Kazoo for basic interaction with ZooKeeper (as in reading from /brokers to get the list of brokers). What I want to avoid is attempting to implement the consumer registration/rebalancing detailed here. While this would allow Python clients to cooperate with Java/Scala clients, I don't think it's actually all that valuable and it is pretty complex. |
Hi guys, I recently had problems with this since the only broker given to the KafkaClient was down. Any app could handle this by creating multiple clients with my known brokers but it would be a shame to do this if the library can handle it. I could start working on a solution for this and I was thinking of doing something similar to the KazooClient constructor which takes a comma-separated list of hosts. Kazoo does some fancy mechanics for retries which could become useful but maybe not at this point. I don't think there is a need to rely on Zookeeper to get the brokers list at this point, especially since Kafka design seems to be trying to abstract most of the Zookeeper interactions to simplify the client. |
while working on this, I ended up having troubles with the test rig. It might be my setup so I thought I'd ask here before opening an issue. I tried contacting you with irc on the kafka channel too, no luck. Overall, I am wondering if the tests should have worked first hand or not? Here are some of the changes I needed to do to run the tests.
I made some changes in my fork to make it work but I needed to set every client to auto_commit=False since I am on 0.8 in Kafka, not trunk. I was also wondering if you'd find problematic to add a dependency on mock for unit tests. It makes it rather easy to add mocks and validate underlying calls in a clean way w/o having all the cluster shebang underneath. thanks! |
To address your three points:
I've been out of the Python development world for a couple of years now. Mock sounds interesting, and getting the unit tests back sounds even more interesting :) Looking forward to your patches |
@mrtheb hey, I believe the kafka client handles this in a way. When you connect to a broker, it requests for information about other brokers and keeps the data with itself (and uses it for partitioning and stuff). However, it does not handle automatic fail-overs yet. I have tried to fix this issue as well as specifying a Zookeeper host (for discovering the brokers) in my zookeeper branch (which is under review by @mumrah ) |
@mahendra as you know, the current client will fail if the broker you are connecting to is down when creating it. When that happens, you need something on top to manage multiple clients failover, which you did in your zookeeper branch. I was heading towards doing it in the client. The client already maintains a list of brokers, I thought that adding failover to it would make more sense than doing it on top. Doing it in the client does not prevent you from using zookeeper to hand out a list of brokers to the client either. I don't have a problem with either approach (failover in client or any class on top) but I have a preference to do it in the client since it's the only part I was using in my app so far. I have a need to be able to distribute consumers across machines which could not be handled by the consumer as it is now. |
I am requesting your opinion. Now that tests are in better shape, I resumed work on supporting multiple hosts in the KafkaClient, this commit contains some of the changes I would suggest. I have unit tests lying somewhere in my branching confusion. I think these changes would work well along changes proposed here as well. As I said before, I don't really mind if we do it this way or only through a Zookeeper wrapper above the client. My arguments from my previous comment still stand though. |
@mrtheb Just got a chance to respond to this. Sorry that I took so long. Yes, I agree that this is a problem, and I prefer providing a list of seed brokers than adding an additional dependency. It also appears to me that some parts of v8 was designed so that clients don't have to contact zookeeper. I will support a pull request with the changes in the commit you referenced. |
@mrtheb had a look at the diffs. maybe you can also add support for retry with a different host if KafkaConnection() fails. (does it do that already?) |
Another idea that I would love is if we can pass the list of host:port to the consumer and producer classes. Currently, we create a KafkaClient and pass it along. Reason: While working on the gevent driver, I had to reopen the socket with gevent.socket. By letting consumers and producers deciding their mode of socket access, we are leaving complexity out of the user. |
Thank you both for your answers, I'll try to spend some time completing my work on this as I want to add unit tests covering those cases before I send the pull request. I want to keep this thing as simple as possible so I wouldn't include retry either. I see it as a next step. |
@mrtheb Are you still working on this? I have some time this weekend to finish this if you want help. |
Hi @jimjh, I spent some time on it on Friday, should be able to send the pull request today. |
Damn... I opened a new issue by submitting my pull request. See #69 |
This should be better... I had to clean my repo a bit. |
This is finished and the ticket should be closed. |
Instead of relying on a single Kafka broker for bootstrapping the cluster, we could accept a list of brokers.
The text was updated successfully, but these errors were encountered: