Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Producer causes high CPU load #579

Closed
affitz opened this issue Mar 9, 2016 · 17 comments
Closed

Producer causes high CPU load #579

affitz opened this issue Mar 9, 2016 · 17 comments
Assignees

Comments

@affitz
Copy link

affitz commented Mar 9, 2016

I'm using the latest asynchronous KafkaProducer and my app is producing several hundred Kafka records per second. At about 500 records/second (which is actually not much), one of the processes of the async producer uses more than 50% of one of my cores.

I don't have such a high CPU usage at even higher rates when I'm using the java producer. Kafka server is version 0.8.1.

@affitz
Copy link
Author

affitz commented Mar 9, 2016

Any idea why?

@affitz
Copy link
Author

affitz commented Mar 11, 2016

Did some more testing and it seems that CPU load is only high when I don't provide a key to the send method. Partitioner is the default one.

@dpkp
Copy link
Owner

dpkp commented Mar 11, 2016

That sounds like it is caused by something in the partitioning code. Are you using the default partitioner? If so, I would expect it to perform better without a key, not worse:

In [1]: from kafka.partitioner.default import DefaultPartitioner

In [2]: partitioner = DefaultPartitioner()

In [3]: all_ = available = range(12)

In [4]: %timeit partitioner('foo', all_, available)
The slowest run took 6.64 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 3.02 µs per loop

In [5]: %timeit partitioner(None, all_, available)
The slowest run took 16.60 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 661 ns per loop

@affitz
Copy link
Author

affitz commented Mar 11, 2016

Update:
I had to remove the key again since the producer stopped sending any messages to Kafka. At least nothing ended up in the topic. That's probably why it used less CPU when I passed a key. The key was simply an int counter, so should be serializable to bytes. Also, the producer did not raise any exceptions. Just no new messages in the topic anymore. After removing the key, messages were published again. So the partitioner doesn't seem to be the problem, sorry for the misleading post above.

That's how I setup the producer:

self._producer = KafkaProducer(bootstrap_servers=hosts,
                               client_id=client_id,
                               metadata_max_age_ms=10000)

Hosts is an array of 6 ip:port pairs.

Messages are simply being sent with self._producer.send(self._topic, msg) where msg is a schema-less avro record (string).

We have around 30 producers running on two hosts. When I disable publishing messages to Kafka by commenting the above call to send, I have a CPU load of about 0-10% per skript. As soon as I start publishing the messages, CPU load goes crazy up to 100% for those producers processing high throughput (500-1500 avro records per second max).

@dpkp
Copy link
Owner

dpkp commented Mar 11, 2016

interesting -- your metadata_max_age_ms is very low (10 seconds?). Do you get the same behavior if you use the default (5 mins)?

@affitz
Copy link
Author

affitz commented Mar 11, 2016

Just tried it, same behaviour.

Setting logging level to INFO shows this at the beginning:

Broker is not v0.9 -- it did not recognize ListGroupsRequest
Broker version identifed as 0.8.2

I also set it to DEBUG but my terminal becomes flooded by messages then. By scanning the output, I always see the same pattern. After sending huge produce requests (I guess it buffers and then sends the messages in a bulk request) for each partition, it gets a couple of ProduceResponses with error_code=0, followed by a couple of these lines:

Sending (key=None value=...
...
Allocating a new 16384 byte message buffer for TopicPartition(topic='data-1', partition=0)
Waking up the sender since TopicPartition(topic='data-1', partition=0) is either full or getting a new batch
Sending (key=None value=...
...

and then finally

Nodes with data ready to send: set([1, 2, 3, 4, 5, 6])
Created 6 produce requests: [huge bulk request]

with a couple requests waiting to be sent. Then it starts all over again with sending the produce requests.

Looks pretty normal to me, but CPU goes nuts :-/

@zackdever
Copy link
Collaborator

I've noticed that there is a high baseline CPU usage around 500-1500 msg/sec, but that I can go up to about 10x that before the CPU pegs. At the lower end around 500-1500, I found setting linger_ms=10 cut the CPU usage almost in half. It might not change anything here since your logging suggests that it might already be batching, but it's something you could try.

I submitted #598 which might also help a little.

@dpkp
Copy link
Owner

dpkp commented Mar 17, 2016

@affitz have you had a chance to try the latest release, 1.0.2 ?

@affitz
Copy link
Author

affitz commented Mar 17, 2016

Yes, just deployed the latest version from github (1.0.3?). Nothing changed :-(

@dpkp
Copy link
Owner

dpkp commented Mar 17, 2016

How big are the messages?

@affitz
Copy link
Author

affitz commented Mar 17, 2016

We have two message types:
len(msg1) = 95
len(msg2) = 109

@dpkp
Copy link
Owner

dpkp commented Mar 17, 2016

how are you running the 30 producers per host? separate processes? multiprocessing? threading?

@affitz
Copy link
Author

affitz commented Mar 17, 2016

Separate processes. I have 30 sensors and for each sensor I execute a collector script separately. It simply connects to the sensor, collects the continuous stream of data and feeds it to Kafka.

@dpkp
Copy link
Owner

dpkp commented Mar 17, 2016

more qs: what version of python are you using, and are you using any serialization or compression?

@affitz
Copy link
Author

affitz commented Mar 17, 2016

np, I appreciate the efforts. I'm using Python 2.7.11. I already get the data as avro records from the sensors and just forward them to Kafka with producer.send(self._topic, msg). I also don't use any compression. I kept this Kafka proxy as simple as possible.

@dpkp
Copy link
Owner

dpkp commented May 22, 2016

Have you been able to resolve this? Benchmarks on the latest release show about 10,000 msgs/sec with a single producer and ~100 byte messages.

@dpkp dpkp self-assigned this Jul 16, 2016
@dpkp
Copy link
Owner

dpkp commented Jul 17, 2016

There are benchmark / load testing scripts in benchmarks/ . Local tests on my laptop (MBP) w/ kafka-python 1.2.5 show a producer sending ~15K messages per second using a single CPU at 100%. I think this is acceptable. Please re-open if you are unable to duplicate.

@dpkp dpkp closed this as completed Jul 17, 2016
tanaypf9 pushed a commit to tanaypf9/pf9-requirements that referenced this issue May 20, 2024
Patch Set 1:

Some of the issues I'm concerned about:

dpkp/kafka-python#674

dpkp/kafka-python#686

dpkp/kafka-python#579

dpkp/kafka-python#551

Patch-set: 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants