Allow KafkaClient to take in a list of brokers for bootstrapping #25

mumrah · 2013-05-29T20:16:13Z

Instead of relying on a single Kafka broker for bootstrapping the cluster, we could accept a list of brokers.

mahendra · 2013-06-12T08:38:35Z

@mumrah this would be a good thing to do. I am working on Zookeeper support for this library. I was planning to start working on support for multiple brokers. Can you let me know the behavior that you are planning in producer/consumer when multiple brokers are provided. (consumer behavior is pretty much documented for kafka - https://cwiki.apache.org/KAFKA/consumer-group-example.html)

mumrah · 2013-06-13T13:01:18Z

@mahendra the list of brokers is really just used for bootstrapping the rest of the brokers. Any broker in the cluster can give metadata on the whole cluster - like what topics exist, how many partitions they have, and which brokers are leaders for each partition. The reason for having a list of brokers instead of a single bootstrapper is in the event the single bootstrapper is unavailable.

ZooKeeper would provide broker discovery as well and would be a good addition. As a side note, I do not want to add the ZooKeeper consumer group rebalancing to this library.

mahendra · 2013-06-13T16:28:13Z

@mumrah ok. i get it.

about zookeeper - do you mean that you don't want the zookeeper support inside python-kafka? as in keep it a a separate library? or only that you want to keep the consumer group re-balancing out?

I was planning on doing it without touching the core python-kafka code. Will discuss the design with you in detail. still working on it.

mumrah · 2013-06-13T16:52:46Z

I'm not opposing to bringing in Kazoo for basic interaction with ZooKeeper (as in reading from /brokers to get the list of brokers). What I want to avoid is attempting to implement the consumer registration/rebalancing detailed here. While this would allow Python clients to cooperate with Java/Scala clients, I don't think it's actually all that valuable and it is pretty complex.

mrtheb · 2013-09-27T21:53:17Z

Hi guys, I recently had problems with this since the only broker given to the KafkaClient was down. Any app could handle this by creating multiple clients with my known brokers but it would be a shame to do this if the library can handle it.

I could start working on a solution for this and I was thinking of doing something similar to the KazooClient constructor which takes a comma-separated list of hosts. Kazoo does some fancy mechanics for retries which could become useful but maybe not at this point.

I don't think there is a need to rely on Zookeeper to get the brokers list at this point, especially since Kafka design seems to be trying to abstract most of the Zookeeper interactions to simplify the client.

mrtheb · 2013-09-30T15:46:24Z

Hi @mumrah, @mahendra,

while working on this, I ended up having troubles with the test rig. It might be my setup so I thought I'd ask here before opening an issue. I tried contacting you with irc on the kafka channel too, no luck.

Overall, I am wondering if the tests should have worked first hand or not? Here are some of the changes I needed to do to run the tests.

The kafka-src submodule points to an older release (0.7), I updated my own environment to make it work but are you still using this setup?
In test resources, kafka.properties still uses zk.connect, this has changed quite a while ago.
In fixtures, when starting KafkaFixture, it is waiting for a log but the case has changed in Kafka itself.

I made some changes in my fork to make it work but I needed to set every client to auto_commit=False since I am on 0.8 in Kafka, not trunk.

I was also wondering if you'd find problematic to add a dependency on mock for unit tests. It makes it rather easy to add mocks and validate underlying calls in a clean way w/o having all the cluster shebang underneath.

thanks!
marc

mumrah · 2013-09-30T23:49:08Z

@mrtheb

To address your three points:

kafka-src needs to get updated to point to the intended revision. Run git submodule update. Annoying, yes I know
zk.connect is the correct property for the version of trunk we currently test against
isn't a problem with the correct version of kafka src

I've been out of the Python development world for a couple of years now. Mock sounds interesting, and getting the unit tests back sounds even more interesting :)

Looking forward to your patches

mahendra · 2013-10-01T10:41:29Z

@mrtheb hey, I believe the kafka client handles this in a way. When you connect to a broker, it requests for information about other brokers and keeps the data with itself (and uses it for partitioning and stuff). However, it does not handle automatic fail-overs yet. I have tried to fix this issue as well as specifying a Zookeeper host (for discovering the brokers) in my zookeeper branch (which is under review by @mumrah )

mrtheb · 2013-10-01T13:35:56Z

@mahendra as you know, the current client will fail if the broker you are connecting to is down when creating it. When that happens, you need something on top to manage multiple clients failover, which you did in your zookeeper branch. I was heading towards doing it in the client. The client already maintains a list of brokers, I thought that adding failover to it would make more sense than doing it on top. Doing it in the client does not prevent you from using zookeeper to hand out a list of brokers to the client either.

I don't have a problem with either approach (failover in client or any class on top) but I have a preference to do it in the client since it's the only part I was using in my app so far. I have a need to be able to distribute consumers across machines which could not be handled by the consumer as it is now.

mrtheb · 2013-10-04T03:20:09Z

@mahendra, @mumrah, @jimjh

I am requesting your opinion.

Now that tests are in better shape, I resumed work on supporting multiple hosts in the KafkaClient, this commit contains some of the changes I would suggest. I have unit tests lying somewhere in my branching confusion. I think these changes would work well along changes proposed here as well.

As I said before, I don't really mind if we do it this way or only through a Zookeeper wrapper above the client. My arguments from my previous comment still stand though.

jimjh · 2013-10-09T05:46:47Z

@mrtheb Just got a chance to respond to this. Sorry that I took so long.

Yes, I agree that this is a problem, and I prefer providing a list of seed brokers than adding an additional dependency. It also appears to me that some parts of v8 was designed so that clients don't have to contact zookeeper. I will support a pull request with the changes in the commit you referenced.

mahendra · 2013-10-09T07:54:10Z

@mrtheb had a look at the diffs. maybe you can also add support for retry with a different host if KafkaConnection() fails. (does it do that already?)

mahendra · 2013-10-10T04:29:39Z

Another idea that I would love is if we can pass the list of host:port to the consumer and producer classes.

Currently, we create a KafkaClient and pass it along.
I would prefer if the consumers and producers create their own KafkaClient instances.

Reason: While working on the gevent driver, I had to reopen the socket with gevent.socket. By letting consumers and producers deciding their mode of socket access, we are leaving complexity out of the user.

mrtheb · 2013-10-15T03:12:42Z

Thank you both for your answers, I'll try to spend some time completing my work on this as I want to add unit tests covering those cases before I send the pull request.

I want to keep this thing as simple as possible so I wouldn't include retry either. I see it as a next step.

jimjh · 2013-11-09T19:55:22Z

@mrtheb Are you still working on this? I have some time this weekend to finish this if you want help.

mrtheb · 2013-11-10T18:31:12Z

Hi @jimjh, I spent some time on it on Friday, should be able to send the pull request today.

mrtheb · 2013-11-10T21:06:40Z

Damn... I opened a new issue by submitting my pull request. See #69

mrtheb · 2013-11-14T15:46:03Z

This should be better... I had to clean my repo a bit.

wizzat · 2014-03-22T23:19:06Z

This is finished and the ticket should be closed.

mumrah mentioned this issue Sep 13, 2013

Access multiple brokers #43

Closed

mrtheb mentioned this issue Nov 14, 2013

Allow KafkaClient to take in a list of brokers for bootstrapping #70

Closed

mrtheb mentioned this issue Feb 9, 2014

Support for multiple hosts on KafkaClient boostrap (improves on #70) #122

Merged

mrtheb closed this as completed Mar 22, 2014

joe-keen mentioned this issue Feb 19, 2016

Performance regression and seg faults moving from 0.9.5 to 1.0 #551

Closed

huangcuiyang mentioned this issue Jan 18, 2020

Consumer deadlock in coordinator when heartbeat request timeout #1985

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow KafkaClient to take in a list of brokers for bootstrapping #25

Allow KafkaClient to take in a list of brokers for bootstrapping #25

mumrah commented May 29, 2013

mahendra commented Jun 12, 2013

mumrah commented Jun 13, 2013

mahendra commented Jun 13, 2013

mumrah commented Jun 13, 2013

mrtheb commented Sep 27, 2013

mrtheb commented Sep 30, 2013

mumrah commented Sep 30, 2013

mahendra commented Oct 1, 2013

mrtheb commented Oct 1, 2013

mrtheb commented Oct 4, 2013

jimjh commented Oct 9, 2013

mahendra commented Oct 9, 2013

mahendra commented Oct 10, 2013

mrtheb commented Oct 15, 2013

jimjh commented Nov 9, 2013

mrtheb commented Nov 10, 2013

mrtheb commented Nov 10, 2013

mrtheb commented Nov 14, 2013

wizzat commented Mar 22, 2014

Allow KafkaClient to take in a list of brokers for bootstrapping #25

Allow KafkaClient to take in a list of brokers for bootstrapping #25

Comments

mumrah commented May 29, 2013

mahendra commented Jun 12, 2013

mumrah commented Jun 13, 2013

mahendra commented Jun 13, 2013

mumrah commented Jun 13, 2013

mrtheb commented Sep 27, 2013

mrtheb commented Sep 30, 2013

mumrah commented Sep 30, 2013

mahendra commented Oct 1, 2013

mrtheb commented Oct 1, 2013

mrtheb commented Oct 4, 2013

jimjh commented Oct 9, 2013

mahendra commented Oct 9, 2013

mahendra commented Oct 10, 2013

mrtheb commented Oct 15, 2013

jimjh commented Nov 9, 2013

mrtheb commented Nov 10, 2013

mrtheb commented Nov 10, 2013

mrtheb commented Nov 14, 2013

wizzat commented Mar 22, 2014