Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make kafka version configurable #5046

Merged

Conversation

ningyougang
Copy link
Contributor

@ningyougang ningyougang commented Jan 18, 2021

  • Make kafka version configurable
  • Make kafka version using latest version: 2.7.0

kafka latest version: 2.7.0 fixed a bug: https://issues.apache.org/jira/browse/KAFKA-8334

It happened in our env, after deploy latest kafka version in PM, the problem gone.

Description

Related issue and scope

  • I opened an issue to propose and discuss this change (#????)

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

@@ -150,7 +150,7 @@ kafka:
protocols:
- TLSv1.2
protocol: "{{ kafka_protocol_for_setup }}"
version: 2.12-2.3.1
version: "{{ kafka_version | default('2.13-2.7.0') }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to see if it does not introduce any regression.

@style95
Copy link
Member

style95 commented Jan 18, 2021

Just for those who visit this PR, that issue makes committing offests take more than 5 seconds, accordingly it would dramatically increase the wait time of certain activations.

@codecov-io
Copy link

codecov-io commented Jan 18, 2021

Codecov Report

Merging #5046 (5cde872) into master (6686820) will decrease coverage by 7.34%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5046      +/-   ##
==========================================
- Coverage   82.52%   75.17%   -7.35%     
==========================================
  Files         206      211       +5     
  Lines       10006    10454     +448     
  Branches      445      471      +26     
==========================================
- Hits         8257     7859     -398     
- Misses       1749     2595     +846     
Impacted Files Coverage Δ
...core/database/cosmosdb/RxObservableImplicits.scala 0.00% <0.00%> (-100.00%) ⬇️
...ore/database/cosmosdb/cache/CacheInvalidator.scala 0.00% <0.00%> (-100.00%) ⬇️
...e/database/cosmosdb/cache/ChangeFeedConsumer.scala 0.00% <0.00%> (-100.00%) ⬇️
...core/database/cosmosdb/CosmosDBArtifactStore.scala 0.00% <0.00%> (-95.85%) ⬇️
...sk/core/database/cosmosdb/CosmosDBViewMapper.scala 0.00% <0.00%> (-93.90%) ⬇️
...tabase/cosmosdb/cache/CacheInvalidatorConfig.scala 0.00% <0.00%> (-92.31%) ⬇️
...enwhisk/connector/kafka/KamonMetricsReporter.scala 0.00% <0.00%> (-83.34%) ⬇️
...e/database/cosmosdb/cache/KafkaEventProducer.scala 0.00% <0.00%> (-78.58%) ⬇️
...whisk/core/database/cosmosdb/CosmosDBSupport.scala 0.00% <0.00%> (-74.08%) ⬇️
...ore/database/azblob/AzureBlobAttachmentStore.scala 11.53% <0.00%> (-60.58%) ⬇️
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6686820...5cde872. Read the comment docs.

Copy link
Member

@rabbah rabbah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ningyougang
Copy link
Contributor Author

ningyougang commented Jan 28, 2021

Have any comment?

I benchmarked against the wurstmeister/kafka:2.13-2.7.0 , worked well. result as below

Env

Env: 3 PM nodes(128G memory; 40 core cpu;)
kafka version: wurstmeister/kafka:2.13-2.7.0

Benchmark test

I run producer and consumer side's script at the same time in kafka0 container.

  • Producer side
# send 50,000,000 records
/opt/kafka_2.13-2.7.0/bin/kafka-producer-perf-test.sh --topic scheduler0 --num-records 50000000  --record-size 100  --throughput -1 --producer-props acks=0 bootstrap.servers=10.105.65.214:9093,10.105.65.215:9094,10.105.65.216:9095
4077415 records sent, 815483.0 records/sec (77.77 MB/sec), 0.5 ms avg latency, 407.0 ms max latency.
4763549 records sent, 952709.8 records/sec (90.86 MB/sec), 0.3 ms avg latency, 15.0 ms max latency.
4392536 records sent, 878507.2 records/sec (83.78 MB/sec), 0.3 ms avg latency, 17.0 ms max latency.
4739980 records sent, 947996.0 records/sec (90.41 MB/sec), 0.3 ms avg latency, 14.0 ms max latency.
4678736 records sent, 935747.2 records/sec (89.24 MB/sec), 0.3 ms avg latency, 10.0 ms max latency.
4289005 records sent, 857801.0 records/sec (81.81 MB/sec), 0.4 ms avg latency, 6.0 ms max latency.
4657720 records sent, 931544.0 records/sec (88.84 MB/sec), 0.3 ms avg latency, 13.0 ms max latency.
4393393 records sent, 878678.6 records/sec (83.80 MB/sec), 0.3 ms avg latency, 11.0 ms max latency.
4286190 records sent, 857238.0 records/sec (81.75 MB/sec), 0.3 ms avg latency, 5.0 ms max latency.
4440910 records sent, 888182.0 records/sec (84.70 MB/sec), 0.3 ms avg latency, 11.0 ms max latency.
4519094 records sent, 903818.8 records/sec (86.19 MB/sec), 0.3 ms avg latency, 13.0 ms max latency.
50000000 records sent, 897311.654284 records/sec (85.57 MB/sec), 0.32 ms avg latency, 407.00 ms max latency, 0 ms 50th, 1 ms 95th, 2 ms 99th, 10 ms 99.9th.

I configured --throughput -1 to disable throttling, we can see, the tps is about 897,311

  • Consumer side
# Fetch 5,000,000  record (running for about 3 seconds)
/opt/kafka_2.13-2.7.0/bin/kafka-consumer-perf-test.sh  --topic scheduler0 --messages 5000000 --bootstrap-server=xxx.xxx.xxx.xxx:9093,xxx.xxx.xxx.xxx:9094,xxx.xxx.xxx.xxx:9095
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2021-01-28 11:46:58:466, 2021-01-28 11:47:01:299, 476.8650, 168.3251, 5000292, 1765016.5902, 1611802018934, -1611802016101, -0.0000, -0.0031

# Fetch 10,000,000   records  (running for about 5 seconds)
/opt/kafka_2.13-2.7.0/bin/kafka-consumer-perf-test.sh  --topic scheduler0 --messages 10000000 --bootstrap-server=xxx.xxx.xxx.xxx:9093,xxx.xxx.xxx.xxx:9094,xxx.xxx.xxx.xxx:9095
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2021-01-28 11:45:58:096, 2021-01-28 11:46:03:224, 953.6926, 185.9775, 10000192, 1950115.4446, 1611801958563, -1611801953435, -0.0000, -0.0062

# Fetch 1,000,000,000 records (running for about 20 minutes)
/opt/kafka_2.13-2.7.0/bin/kafka-consumer-perf-test.sh  --topic scheduler0 --messages 1000000000 --bootstrap-server=xxx.xxx.xxx.xxx:9093,xxx.xxx.xxx.xxx:9094,xxx.xxx.xxx.xxx:9095
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2021-01-28 11:55:33:186, 2021-01-28 12:16:03:855, 95367.4615, 77.4924, 1000000313, 812566.4277, 1611802533652, -1611801302983, -0.0001, -0.6204

We can saw the the tps is very high as well, when fetch 5,000,000, the tps is 1,765,016, when fetch 10,000,000 the tps is 1,950,115, when fetch 1,000,000,000, the tps is 81,2566

I also tested below scenes
when running kafka-producer-perf-test.sh, i adjusted the --num-records to huge value to make the producer script running for long time, when running kafka-consumer-perf-test.sh, i ajusted the --messages to huge value as well to make the consumer script running for long time

During benchmarking, i observed the cpu/memory doesn't consume too much for all kafka nodes

FYI, on our production env, the consumer.commit offset timeout(>=5.seconds) issue didn't reappeared after we used the latest kafka version.

@style95
Copy link
Member

style95 commented Jan 28, 2021

What I mentioned was the benchmarking of OpenWhisk with this version of Kafka.
But anyway it looks good to me as is.

@ningyougang ningyougang reopened this Jan 29, 2021
@rabbah
Copy link
Member

rabbah commented Jan 29, 2021

Thanks @ningyougang and @style95!

@ningyougang ningyougang closed this Feb 1, 2021
@ningyougang ningyougang reopened this Feb 1, 2021
@ningyougang ningyougang closed this Feb 1, 2021
@ningyougang ningyougang reopened this Feb 1, 2021
@ningyougang ningyougang closed this Feb 2, 2021
@ningyougang ningyougang reopened this Feb 2, 2021
@ningyougang ningyougang closed this Feb 3, 2021
@ningyougang ningyougang reopened this Feb 3, 2021
@ningyougang ningyougang reopened this Feb 22, 2021
@jiangpengcheng jiangpengcheng merged commit df1970b into apache:master Apr 29, 2021
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants