-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Producer causes high CPU load #579
Comments
Any idea why? |
Did some more testing and it seems that CPU load is only high when I don't provide a key to the send method. Partitioner is the default one. |
That sounds like it is caused by something in the partitioning code. Are you using the default partitioner? If so, I would expect it to perform better without a key, not worse:
|
Update: That's how I setup the producer: self._producer = KafkaProducer(bootstrap_servers=hosts,
client_id=client_id,
metadata_max_age_ms=10000) Hosts is an array of 6 ip:port pairs. Messages are simply being sent with We have around 30 producers running on two hosts. When I disable publishing messages to Kafka by commenting the above call to send, I have a CPU load of about 0-10% per skript. As soon as I start publishing the messages, CPU load goes crazy up to 100% for those producers processing high throughput (500-1500 avro records per second max). |
interesting -- your metadata_max_age_ms is very low (10 seconds?). Do you get the same behavior if you use the default (5 mins)? |
Just tried it, same behaviour. Setting logging level to INFO shows this at the beginning:
I also set it to DEBUG but my terminal becomes flooded by messages then. By scanning the output, I always see the same pattern. After sending huge produce requests (I guess it buffers and then sends the messages in a bulk request) for each partition, it gets a couple of ProduceResponses with error_code=0, followed by a couple of these lines:
and then finally
with a couple requests waiting to be sent. Then it starts all over again with sending the produce requests. Looks pretty normal to me, but CPU goes nuts :-/ |
I've noticed that there is a high baseline CPU usage around 500-1500 msg/sec, but that I can go up to about 10x that before the CPU pegs. At the lower end around 500-1500, I found setting I submitted #598 which might also help a little. |
@affitz have you had a chance to try the latest release, 1.0.2 ? |
Yes, just deployed the latest version from github (1.0.3?). Nothing changed :-( |
How big are the messages? |
We have two message types: |
how are you running the 30 producers per host? separate processes? multiprocessing? threading? |
Separate processes. I have 30 sensors and for each sensor I execute a collector script separately. It simply connects to the sensor, collects the continuous stream of data and feeds it to Kafka. |
more qs: what version of python are you using, and are you using any serialization or compression? |
np, I appreciate the efforts. I'm using Python 2.7.11. I already get the data as avro records from the sensors and just forward them to Kafka with |
Have you been able to resolve this? Benchmarks on the latest release show about 10,000 msgs/sec with a single producer and ~100 byte messages. |
There are benchmark / load testing scripts in |
Patch Set 1: Some of the issues I'm concerned about: dpkp/kafka-python#674 dpkp/kafka-python#686 dpkp/kafka-python#579 dpkp/kafka-python#551 Patch-set: 1
I'm using the latest asynchronous KafkaProducer and my app is producing several hundred Kafka records per second. At about 500 records/second (which is actually not much), one of the processes of the async producer uses more than 50% of one of my cores.
I don't have such a high CPU usage at even higher rates when I'm using the java producer. Kafka server is version 0.8.1.
The text was updated successfully, but these errors were encountered: