Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 823 > Use max_wait_time while sleeping in fetch_batches #825

Conversation

ElenaHenderson
Copy link
Contributor

@ElenaHenderson ElenaHenderson commented Apr 28, 2020

This an attempt to fix #823 that causes consumer to sleep for 2 seconds and thus causing an avg 1.5 second pause between when a message is published and then started to be processed by a consumer.

Note that my fix is based on research done by https://github.com/paneq in the above issue.

@paneq
Copy link

paneq commented Apr 28, 2020

@ElenaHenderson let us know when this is ready for testing and review

@ElenaHenderson
Copy link
Contributor Author

@paneq @dasch This is ready for review and testing!

Thank you so much!

Copy link
Contributor

@dasch dasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me. @paneq can you verify that this actually resolves the issue?

@paneq
Copy link

paneq commented Apr 29, 2020

@dasch some comments from me:

a) It resolves the latency issue

b) I had to explicitly add require "concurrent/map" to my test project. Otherwise, I experienced:

Traceback (most recent call last):
	3: from consume.rb:8:in `<main>'
	2: from /Users/rupert/.rbenv/versions/2.5.5/lib/ruby/gems/2.5.0/bundler/gems/ruby-kafka-9ed5ecc4ebb9/lib/kafka/client.rb:397:in `consumer'
	1: from /Users/rupert/.rbenv/versions/2.5.5/lib/ruby/gems/2.5.0/bundler/gems/ruby-kafka-9ed5ecc4ebb9/lib/kafka/client.rb:397:in `new'
/Users/rupert/.rbenv/versions/2.5.5/lib/ruby/gems/2.5.0/bundler/gems/ruby-kafka-9ed5ecc4ebb9/lib/kafka/consumer.rb:79:in `initialize': uninitialized constant Kafka::Consumer::Concurrent (NameError)

c) I am worried we are not adding here any test issue to cover this situation. Does this gem have any end-to-end tests that could be utilized to measure latency so that similar problem is not introduced in the future.

d) Honestly, I still fail to understand how this gem (or Kafka) works internally.

Particularly, I don't understand why between fetch requests which receive data from last 0.1s (configured max_wait_time) there are a couple of fetch requests which return in 0.01s without data.

Code:

#producer
loop do
  kafka.deliver_message({time: Time.now.to_f}.to_json, topic: "greetings", key: "Hello")
  puts "delivered"
  sleep(0.005)
end

# consumer
consumer.each_message(min_bytes: 1, max_wait_time: 0.1) do |message|
  now = Time.now.to_f
  sent = JSON.parse(message.value)['time']
  diff = now - sent
  puts("diff: #{diff.round(2)}")
end

Here is my output:

D, [2020-04-29T11:34:54.104006 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T11:34:54.104115 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 5911 to 127.0.0.1:9092
D, [2020-04-29T11:34:54.104297 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 5911 from 127.0.0.1:9092
diff: 0.1
D, [2020-04-29T11:34:54.105206 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33572 as processed
diff: 0.09
D, [2020-04-29T11:34:54.105330 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33573 as processed
diff: 0.08
D, [2020-04-29T11:34:54.105477 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33574 as processed
diff: 0.07
D, [2020-04-29T11:34:54.105678 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33575 as processed
diff: 0.06
D, [2020-04-29T11:34:54.106100 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33576 as processed
diff: 0.05
D, [2020-04-29T11:34:54.106418 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33577 as processed
diff: 0.04
D, [2020-04-29T11:34:54.106512 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33578 as processed
diff: 0.02
D, [2020-04-29T11:34:54.106587 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33579 as processed
D, [2020-04-29T11:34:54.106679 #93782] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-29T11:34:54.114277 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 5911 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.114357 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T11:34:54.114455 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 5912 to 127.0.0.1:9092
D, [2020-04-29T11:34:54.114574 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 5912 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.126082 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 5912 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.126250 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T11:34:54.126412 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 5913 to 127.0.0.1:9092
D, [2020-04-29T11:34:54.126555 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 5913 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.137080 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 5913 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.137184 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T11:34:54.137334 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 5914 to 127.0.0.1:9092
D, [2020-04-29T11:34:54.137964 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 5914 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.147415 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 5914 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.147517 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T11:34:54.147607 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 5915 to 127.0.0.1:9092
D, [2020-04-29T11:34:54.147718 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 5915 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.163048 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 5915 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.163192 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T11:34:54.163489 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 5916 to 127.0.0.1:9092
D, [2020-04-29T11:34:54.163624 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 5916 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.174798 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 5916 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.174903 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T11:34:54.174995 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 5917 to 127.0.0.1:9092
D, [2020-04-29T11:34:54.175095 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 5917 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.183879 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 5917 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.184045 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T11:34:54.184278 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 5918 to 127.0.0.1:9092
D, [2020-04-29T11:34:54.184418 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 5918 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.194192 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 5918 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.194304 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T11:34:54.194414 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 5919 to 127.0.0.1:9092
D, [2020-04-29T11:34:54.194528 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 5919 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.204990 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 5919 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.205163 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T11:34:54.205364 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 5920 to 127.0.0.1:9092
D, [2020-04-29T11:34:54.205714 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 5920 from 127.0.0.1:9092
diff: 0.1
D, [2020-04-29T11:34:54.210715 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33580 as processed
diff: 0.09
D, [2020-04-29T11:34:54.210814 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33581 as processed
diff: 0.08
D, [2020-04-29T11:34:54.210882 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33582 as processed
diff: 0.07
D, [2020-04-29T11:34:54.210948 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33583 as processed
diff: 0.06
D, [2020-04-29T11:34:54.211048 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33584 as processed
diff: 0.04
D, [2020-04-29T11:34:54.211111 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33585 as processed
diff: 0.03
D, [2020-04-29T11:34:54.211175 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33586 as processed
diff: 0.02
D, [2020-04-29T11:34:54.211221 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33587 as processed
diff: 0.01
D, [2020-04-29T11:34:54.211264 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33588 as processed
D, [2020-04-29T11:34:54.211320 #93782] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-29T11:34:54.216002 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 5920 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.216083 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T11:34:54.216275 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 5921 to 127.0.0.1:9092
D, [2020-04-29T11:34:54.216390 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 5921 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.226639 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 5921 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.226800 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T11:34:54.226950 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 5922 to 127.0.0.1:9092
D, [2020-04-29T11:34:54.227164 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 5922 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.235370 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 5922 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.235492 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T11:34:54.235638 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 5923 to 127.0.0.1:9092
D, [2020-04-29T11:34:54.235855 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 5923 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.246979 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 5923 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.247136 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T11:34:54.247246 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 5924 to 127.0.0.1:9092
D, [2020-04-29T11:34:54.247342 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 5924 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.261146 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 5924 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.261315 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T11:34:54.261455 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 5925 to 127.0.0.1:9092
D, [2020-04-29T11:34:54.261656 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 5925 from 127.0.0.1:9092
diff: 0.1
D, [2020-04-29T11:34:54.313068 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33589 as processed
diff: 0.09
D, [2020-04-29T11:34:54.313194 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33590 as processed
diff: 0.08
D, [2020-04-29T11:34:54.313316 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33591 as processed
diff: 0.07
D, [2020-04-29T11:34:54.313437 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33592 as processed
diff: 0.06
D, [2020-04-29T11:34:54.313551 #93782] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33593 as processed
D, [2020-04-29T11:34:54.313614 #93782] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-29T11:34:54.365806 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 5925 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.365976 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T11:34:54.366127 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 5926 to 127.0.0.1:9092
D, [2020-04-29T11:34:54.366261 #93782] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 5926 from 127.0.0.1:9092
D, [2020-04-29T11:34:54.416900 #93782] DEBUG -- : [[my-consumer] {}:] No batches to process

If I change my producer to produce less often:

loop do
  kafka.deliver_message({time: Time.now.to_f}.to_json, topic: "greetings", key: "Hello")
  puts "delivered"
  sleep(0.2)
end

and keep the consumer the same:

consumer.each_message(min_bytes: 1, max_wait_time: 0.1) do |message|
  now = Time.now.to_f
  sent = JSON.parse(message.value)['time']
  diff = now - sent
  puts("diff: #{diff.round(2)}")
end

Then there are less of those fetch request between actual processing.

diff: 0.05
D, [2020-04-29T12:00:09.529827 #94065] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33659 as processed
D, [2020-04-29T12:00:09.529976 #94065] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-29T12:00:09.589381 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 60 from 127.0.0.1:9092
D, [2020-04-29T12:00:09.589623 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T12:00:09.589891 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 61 to 127.0.0.1:9092
D, [2020-04-29T12:00:09.590197 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 61 from 127.0.0.1:9092
D, [2020-04-29T12:00:09.630621 #94065] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-29T12:00:09.690071 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 61 from 127.0.0.1:9092
D, [2020-04-29T12:00:09.690197 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T12:00:09.690303 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 62 to 127.0.0.1:9092
D, [2020-04-29T12:00:09.690427 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 62 from 127.0.0.1:9092
diff: 0.05
D, [2020-04-29T12:00:09.731050 #94065] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:33660 as processed
D, [2020-04-29T12:00:09.731159 #94065] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-29T12:00:09.793750 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 62 from 127.0.0.1:9092
D, [2020-04-29T12:00:09.793856 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T12:00:09.794026 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 63 to 127.0.0.1:9092
D, [2020-04-29T12:00:09.794211 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 63 from 127.0.0.1:9092
D, [2020-04-29T12:00:09.831838 #94065] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-29T12:00:09.898081 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 63 from 127.0.0.1:9092
D, [2020-04-29T12:00:09.898400 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T12:00:09.898755 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 64 to 127.0.0.1:9092
D, [2020-04-29T12:00:09.899162 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 64 from 127.0.0.1:9092
D, [2020-04-29T12:00:09.903473 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 64 from 127.0.0.1:9092
D, [2020-04-29T12:00:09.903659 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-29T12:00:09.903808 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 65 to 127.0.0.1:9092
D, [2020-04-29T12:00:09.903943 #94065] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 65 from 127.0.0.1:9092
diff: 0.04

Do I understand correctly that there is on thread doing data fetching, and a different thread doing data processing? And that the processing thread sleeps when there is nothing to process and while it sleeps we might get data from multiple fetch requests (because of min_bytes: 1) ?

@dasch
Copy link
Contributor

dasch commented Apr 29, 2020

a) It resolves the latency issue

Awesome!

b) I had to explicitly add require "concurrent/map" to my test project. Otherwise, I experienced:

Traceback (most recent call last):
	3: from consume.rb:8:in `<main>'
	2: from /Users/rupert/.rbenv/versions/2.5.5/lib/ruby/gems/2.5.0/bundler/gems/ruby-kafka-9ed5ecc4ebb9/lib/kafka/client.rb:397:in `consumer'
	1: from /Users/rupert/.rbenv/versions/2.5.5/lib/ruby/gems/2.5.0/bundler/gems/ruby-kafka-9ed5ecc4ebb9/lib/kafka/client.rb:397:in `new'
/Users/rupert/.rbenv/versions/2.5.5/lib/ruby/gems/2.5.0/bundler/gems/ruby-kafka-9ed5ecc4ebb9/lib/kafka/consumer.rb:79:in `initialize': uninitialized constant Kafka::Consumer::Concurrent (NameError)

Does that only happen on this branch or also on master?

c) I am worried we are not adding here any test issue to cover this situation. Does this gem have any end-to-end tests that could be utilized to measure latency so that similar problem is not introduced in the future.

There are no tests for latency. There are end-to-end tests, but it can be difficult to make them reliable because Kafka can be a bit unpredictable.

d) Honestly, I still fail to understand how this gem (or Kafka) works internally.

Particularly, I don't understand why between fetch requests which receive data from last 0.1s (configured max_wait_time) there are a couple of fetch requests which return in 0.01s without data.

Do I understand correctly that there is on thread doing data fetching, and a different thread doing data processing? And that the processing thread sleeps when there is nothing to process and while it sleeps we might get data from multiple fetch requests (because of min_bytes: 1) ?

That's exactly right. There's a buffer between the processor and the fetcher that gets pushed to even when the processor is busy – this greatly improves throughput. The sleep was done in order to "back off" when there's nothing to process in order to avoid pegging the CPU. It was a too high sleep duration, making it match max_wait_time makes much more sense.

@ElenaHenderson
Copy link
Contributor Author

@paneq Thank you for testing this change. I am ready to merge whenever you give me a green flag and satisfied with everything.

@paneq
Copy link

paneq commented Apr 30, 2020

Does that only happen on this branch or also on master?

Only on this branch.

There are no tests for latency. There are end-to-end tests, but it can be difficult to make them reliable because Kafka can be a bit unpredictable.

Let's try not to add them in this fix, but would you be OK with having them in general if someone could make them reliably work?

That's exactly right.

Got it. In such case, I think the sleep should 1/Nth of max_wait_time. I am not sure what N would be good (2, 3, 4, 5, ? dunno). Since max_wait_time is (obviously lol) the Max time we can wait, there is a chance that new data will arrive earlier. Max is worst case scenario. In fact for my usecases where latency is more important than throughput, I could imagine setting min_bytes: 1, max_wait_time: 10s. But as soon as something is there to get, I want it to be fetched and processed. But now I am waiting quite long for the sleep to finish. So we don't know if the answer from kafka will be here in 0.1s or 6s. We should regularly check if it is there. Not sure what is the best way to achieve it, but I believe we are coupling two settings which should be distinct.

@dasch
Copy link
Contributor

dasch commented Apr 30, 2020

Does that only happen on this branch or also on master?

Only on this branch.

Do you have a small reproducible example?

There are no tests for latency. There are end-to-end tests, but it can be difficult to make them reliable because Kafka can be a bit unpredictable.

Let's try not to add them in this fix, but would you be OK with having them in general if someone could make them reliably work?

I would love that!

That's exactly right.

Got it. In such case, I think the sleep should 1/Nth of max_wait_time. I am not sure what N would be good (2, 3, 4, 5, ? dunno). Since max_wait_time is (obviously lol) the Max time we can wait, there is a chance that new data will arrive earlier. Max is worst case scenario. In fact for my usecases where latency is more important than throughput, I could imagine setting min_bytes: 1, max_wait_time: 10s. But as soon as something is there to get, I want it to be fetched and processed. But now I am waiting quite long for the sleep to finish. So we don't know if the answer from kafka will be here in 0.1s or 6s. We should regularly check if it is there. Not sure what is the best way to achieve it, but I believe we are coupling two settings which should be distinct.

What specific sleep duration would you want?

@dasch dasch merged commit 703bf80 into zendesk:master May 5, 2020
@paneq
Copy link

paneq commented May 5, 2020

What specific sleep duration would you want?

@dasch I finally had a moment to calculate it with a very sophisticated algorithm ;)

On my Macbook Pro (which I know is not representable):

  • loop{1+1; sleep(0.005) } leads to 1% CPU usage
  • loop{1+1; sleep(0.01) } leads to 0.5% CPU usage

So I believe the sleep should be way lower than max_wait_time, ideally configurable and we could go with 0.005 as the default value.

@paneq
Copy link

paneq commented May 8, 2020

@dasch I know this PR is merged, but what do you think about my previous comment.

@dasch
Copy link
Contributor

dasch commented May 11, 2020

I think 0.005 is way too low. We have dozens of these processes running, also on our laptops as part of our dev environment. I would be happy with a PR that adds an optional configuration key for this, but using max_wait_time as the default makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants