Add Consumer group data, with offsets and lag #108

cvcal · 2015-08-04T17:12:07Z

This addresses issue #80

cvcal · 2015-08-05T17:56:26Z

Two solutions to this problem arrived on the same day (the walmart one, #109, and this one).

Walmart doesn't use curator tree caches unlike the topics cache and also my solution's consumer cache. This reduces the amount of memory being used. Walmart also adds filtering of inactive consumers. However, its consumer and offset code is minimally integrated into the existing code. Some features, like modifications to the topic view to add offsets and a list of consumers reading from the topic, are less feasible with this lack of integration.

I am thinking of replacing the method I get consumer data with Walmart's method, to take advantage of the memory efficiency, while maintaining the integration into the actor system of the current kafka-manager. This means that calls to the consumer list view will be faster due to the caching in BrokerViewCacheActor. This will also allow for a more complete integration of the consumer and offset information into the UI.

Please reply to let me know if this sounds like a good solution.

jisookim0513 · 2015-08-07T00:04:14Z

I might be wrong, but I prefer your original approach of using consumer cache despite of the fact that it takes more memory than fetching data directly from Zookeeper. One of the things I like Kafka Manager is that it maintains a cache about Kafka state so it doesn't have to fetch data from Zookeeper every time I want to get some information about brokers or topics. I think the use of consumer cache fits better to the design of Kafka Manager, and could be better for future usage if I want to add a REST API to Kafka Manager to retrieve consumer information, in which case Kafka Manager wouldn't have to fetch information from Zookeeper but return value directly from its cache to the client. It's just my two cents.

cvcal · 2015-08-07T13:49:08Z

I agree, that was why I used the cache in the first place, but I thought it was worth trying to call zookeeper directly, to check if the performance didn't improve significantly. At first glance, it doesn't look like calling zookeeper directly made a big difference in the memory usage anyway.

cvcal · 2015-08-07T14:09:30Z

If I were to add filtering of inactive consumers (a consumer that don't list any topics that it reads from) should I make it an option - say part of ClusterConfig?

patelh · 2015-08-08T00:55:58Z

Can you add tests?

cvcal · 2015-08-09T17:46:05Z

I can add tests that simply respond that there aren't any consumers, but at least it would test that the call went through, etc. I'm not sure how to modify your SeededBroker to add a seeded consumer as well as a topic. In the meantime, here's a simple addition to the KafkaStateActor test that will get empty lists of consumers.

jisookim0513 · 2015-08-12T06:17:55Z

Could you squash the commits?

cvcal · 2015-08-12T18:34:34Z

Currently, the call to get the partition offsets with SimpleConsumer is blocking, I'm going to transfer all of those operations to Futures.

@patelh Do you have a preference as to how I get an executionContext? If I'm going to have futures in the ActorModel, I'll need a local ec to do some of the processing.

jisookim0513 · 2015-08-13T18:23:17Z

@cvcal where would you like to have futures in the ActorModel? Not sure if having an execution context in ActorModel is a good idea. Also could you please squash commits into one when you think you won't need to reset to previous commit? It'll be nice to keep the number of commits under 5 per PR, ideally 1 or 2.

cvcal · 2015-08-13T18:36:48Z

Ok, I will squash everything, I just want to finish this first. I had initially thought you were referring to the most recent commits.

Currently, with a sufficient number of topics and partitions, having to make a simple consumer for each can trigger timeouts, especially with calls like KSGetAllTopicDescriptions. I thought having futures within the returned value (within the topic description, maybe within TopicIdentities too, though the from() method could handle the block-until-complete) would be the cleanest and most resilient solution. Is there another solution you would prefer? The KafkaStateActor would need an execution context regardless even if ActorModel doesn't, since it would be placing the SimpleConsumer calls in futures.

@jisookim0513 Would this be acceptable? If so, is there a way to get an execution context that would be better than another?

jisookim0513 · 2015-08-17T18:28:24Z

@cvcal sorry, I have been very busy with other issues so I couldn't get a chance to read through your entire PR. I will try to do it tonight, but for the execution context, you can consider changing KafkaStateActor to LongRunningActor and use longRunningExecutionContext.

cvcal · 2015-08-17T18:40:58Z

Ah, I hadn't noticed that LongRunningPoolActor was a trait and could be mixed with BaseQueryCommandActor, thanks!

I'll try to have my update out by tonight then. (update: I might not have a version I'm happy with by tonight, because this currently creates too many futures, and I haven't gotten it to work by grouping the calls by topic yet, but if you want to see the modification of KafkaStateActor that I've been asking about - here's a rough version: cvcal@6609b65)

jisookim0513 · 2015-08-18T04:46:55Z

Cool, thanks. Could you ping me when you have the version you like? I'd like to try it on my laptop as well.

cvcal · 2015-08-18T14:25:14Z

@jisookim0513: So this version is better - it still has a problem with creating too many futures because it ends up asking for fresh offsets for a topic every time you ask for the topic description or for the consumer description of a consumer that eats from it - so if a topic or topics get consumed from a lot, the redundant work piles up and could approach the task-limit. The fix for that is caching the results, and I'm working on that, but this should work for now.

jisookim0513 · 2015-08-18T21:06:11Z

app/kafka/manager/KafkaStateActor.scala

+      case KSGetConsumedTopicDescription(consumer, topic) =>
+        sender ! getConsumedTopicDescription(consumer,topic)
+
+      case KSGetConsumerDescriptions(consumers) =>


Where is this used?

I guess KSGetConsumerDescriptions doesn't get used, you're right, I added it automatically, when I was thinking of having a list of valid consumer groups that the user could curate, and forgot to get rid of it.

jisookim0513 · 2015-08-18T21:39:46Z

Hmm.. I tried the latest version on your branch, and it worked with my local testing cluster that had less than three topics. However, it doesn't seem to be working with a remote cluster that has quite a bit of topics and partitions. Caching and polling the information sounds like a good idea instead of fetching fresh data every time. However, I am a bit concerned that I wasn't able to see the topic list with the use of futures. Would it be insufficient if we don't have the topic offset information on TopicDescription (and TopicIdentity)?

@patelh what do you think?

cvcal · 2015-08-19T03:03:20Z

It didn't work at all? I'm having a problem with java IO exceptions currently, is the problem you're having linked to ClosedChannelExceptions? I'm trying to fix that right now, so if the problem you were having is unrelated, could you let me know?

jisookim0513 · 2015-08-19T03:57:37Z

@cvcal with my local test cluster I remember I was having a problem with ClosedChannelExceptions but I was able to see consumer information and topic offsets. For the remote cluster, the time-out occurred so I couldn't access topic list (only broker list worked).

cvcal · 2015-08-19T14:00:02Z

Ok, that's what I thought. This should fix it.

If you disagree with my decision to put offsets in the consumer and topic identities, feel free to change it, but my internship ends this week, so I probably won't have a chance to implement any changes you suggest at this point. I do want to clean up a few things on the UI-end that I had been neglecting as I was trying to resolve the bigger problem, so I'll squash all the commits once that is over.

If you're curious, the problem was that I was creating too many channels (one for each simple consumer) and was running up against a java limit. I had modeled the offset-getting simpleconsumer code after Kafka's GetOffsetShell, which did not close the connections it created, probably because it could get away with it. In this case, though, the limit was a problem.

The reason you could still see the brokerlist was that that was the only page that listed information that did not try to get topic or consumer identities, and so it relied only on the channels created for the curator caches, which were still open. All other calls ended up requiring an update to the topic identities, which is what timed out since it couldn't create new consumers. Caching delayed the arrival of the channel limit, since it makes fewer simple consumers, and then adding consumer.close() fixed the underlying problem.

…s to the topics. The KafkaStateActor now tracks the following additional information: - adds a Curator treeCache to cache the consumer information, which includes the names of consumer instances that own each partition, and the offsets at every partition. - adds cached offsets for the topics. These must be gotten directly from Kafka as they are not stored in Zookeeper, so the call functions by using a simple consumer for every partition. Currently, the frequency of updates to these offsets is hard-coded to every 5 seconds or less frequently. - configurable in the clusterConfig is the possibility to exclude any consumer that do not have an offsets/ owners/ AND ids/ subdirectory as being "inactive" The UI now has a consumer list view, which lists all consumers and the topics they consume from, an individual page for each consumer that lists each topic with the cumulative lag and the percentage of partitions that have an owner (if all is well, this should be 100%), a page for every consumed topic that lists the offset, lag, and owner information at every partition for that consumer-topic combination. To the topic views, there is now the most recent offset at every partition and the cumulative log size, as well as links to the consumed state of the topic inside a list of the consumers that consume from this topic.

cvcal · 2015-08-19T22:18:15Z

So I cleaned up the UI a little bit, and updated the README with the new images, here they are for easy access:

First here is the ConsumerListView:

The ConsumedTopicView:

And the alteration to the TopicView:

patelh · 2015-08-26T17:09:34Z

Please rebase off master. Let me know if you need help (please add me as contributor to your repo so I can push to your branch).

patelh · 2015-08-27T05:08:14Z

I've started fixing merge conflicts on my local branch.

cvcal · 2015-08-27T14:14:55Z

Thanks @patelh! I'm also no longer a contributor to Knewton's repo, but I'll see what I can do.

cvcal · 2015-09-04T15:42:37Z

@patelh If knewton were to restore my write access temporarily, could you give me access to where you were fixing the conflicts so that I can push it to this repo?
Thanks for helping out with this!

cvcal · 2015-09-12T06:21:41Z

I don't know if this helps, but if you push your merge conflict fixes to my repository, I can ask Knewton to accept my pull request to their fork. I'm sorry this is more complicated and slow than it could have been. Thanks for your patience, @patelh

patelh · 2015-09-20T07:15:34Z

I've pushed a new branch to your repo. Please push to knewton or close this PR and create a new one or I can create a new PR with your branch (your commits would be preserved).

patelh · 2015-09-22T04:55:12Z

@cvcal It may be faster to create a new pull request to merge cvcal branch from your repo to yahoo's. Please just create a new PR.

cvcal · 2015-09-22T08:24:33Z

I agree, I wanted to see if it could happen quickly enough to be worth it, but I doubt it.
I created #131 instead.

Add Consumer group data, with offsets and lag - #108 but direct

cvcal force-pushed the master branch from eff8ef8 to 8eedbfe Compare August 12, 2015 13:36

cvcal force-pushed the master branch from 6609b65 to 8eedbfe Compare August 17, 2015 19:47

jisookim0513 reviewed Aug 18, 2015
View reviewed changes

cvcal force-pushed the master branch 2 times, most recently from 0bf93d5 to 93f04e1 Compare August 19, 2015 21:59

cvcal force-pushed the master branch from 93f04e1 to 961101e Compare August 19, 2015 22:13

cvcal mentioned this pull request Sep 20, 2015

Cvcal Knewton/kafka-manager#5

Merged

cvcal mentioned this pull request Sep 22, 2015

Add Consumer group data, with offsets and lag - https://github.com/yahoo/kafka-manager/pull/108 but direct #131

Merged

cvcal closed this Sep 22, 2015

patelh added a commit that referenced this pull request Sep 23, 2015

Merge pull request #131 from cvcal/cvcal

a98cc74

Add Consumer group data, with offsets and lag - #108 but direct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Consumer group data, with offsets and lag #108

Add Consumer group data, with offsets and lag #108

cvcal commented Aug 4, 2015

cvcal commented Aug 5, 2015

jisookim0513 commented Aug 7, 2015

cvcal commented Aug 7, 2015

cvcal commented Aug 7, 2015

patelh commented Aug 8, 2015

cvcal commented Aug 9, 2015

jisookim0513 commented Aug 12, 2015

cvcal commented Aug 12, 2015

jisookim0513 commented Aug 13, 2015

cvcal commented Aug 13, 2015

jisookim0513 commented Aug 17, 2015

cvcal commented Aug 17, 2015

jisookim0513 commented Aug 18, 2015

cvcal commented Aug 18, 2015

jisookim0513 Aug 18, 2015

cvcal Aug 19, 2015

jisookim0513 commented Aug 18, 2015

cvcal commented Aug 19, 2015

jisookim0513 commented Aug 19, 2015

cvcal commented Aug 19, 2015

cvcal commented Aug 19, 2015

patelh commented Aug 26, 2015

patelh commented Aug 27, 2015

cvcal commented Aug 27, 2015

cvcal commented Sep 4, 2015

cvcal commented Sep 12, 2015

patelh commented Sep 20, 2015

patelh commented Sep 22, 2015

cvcal commented Sep 22, 2015

Add Consumer group data, with offsets and lag #108

Add Consumer group data, with offsets and lag #108

Conversation

cvcal commented Aug 4, 2015

cvcal commented Aug 5, 2015

jisookim0513 commented Aug 7, 2015

cvcal commented Aug 7, 2015

cvcal commented Aug 7, 2015

patelh commented Aug 8, 2015

cvcal commented Aug 9, 2015

jisookim0513 commented Aug 12, 2015

cvcal commented Aug 12, 2015

jisookim0513 commented Aug 13, 2015

cvcal commented Aug 13, 2015

jisookim0513 commented Aug 17, 2015

cvcal commented Aug 17, 2015

jisookim0513 commented Aug 18, 2015

cvcal commented Aug 18, 2015

jisookim0513 Aug 18, 2015

Choose a reason for hiding this comment

cvcal Aug 19, 2015

Choose a reason for hiding this comment

jisookim0513 commented Aug 18, 2015

cvcal commented Aug 19, 2015

jisookim0513 commented Aug 19, 2015

cvcal commented Aug 19, 2015

cvcal commented Aug 19, 2015

patelh commented Aug 26, 2015

patelh commented Aug 27, 2015

cvcal commented Aug 27, 2015

cvcal commented Sep 4, 2015

cvcal commented Sep 12, 2015

patelh commented Sep 20, 2015

patelh commented Sep 22, 2015

cvcal commented Sep 22, 2015