Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zilla crashes when a lot of MQTT clients are connected #762

Closed
vordimous opened this issue Jan 24, 2024 · 4 comments · Fixed by #828
Closed

Zilla crashes when a lot of MQTT clients are connected #762

vordimous opened this issue Jan 24, 2024 · 4 comments · Fixed by #828
Assignees
Labels
bug Something isn't working

Comments

@vordimous
Copy link
Contributor

Describe the bug
Running the taxi-demo and load_test.sh script simulates a large number of connected MQTT clients, producing ~100k messages within a few min. Zilla crashes with the below error:

org.agrona.concurrent.AgentTerminationException: java.lang.NullPointerException: Cannot read field "initialAck" because "stream" is null
at io.aklivity.zilla.runtime.engine@0.9.60/io.aklivity.zilla.runtime.engine.internal.registry.DispatchAgent.doWork(DispatchAgent.java:707)
at org.agrona.core/org.agrona.concurrent.AgentRunner.doDutyCycle(AgentRunner.java:291)
at org.agrona.core/org.agrona.concurrent.AgentRunner.run(AgentRunner.java:164)
at java.base/java.lang.Thread.run(Thread.java:1623)
Caused by: java.lang.NullPointerException: Cannot read field "initialAck" because "stream" is null
at io.aklivity.zilla.runtime.binding.kafka@0.9.60/io.aklivity.zilla.runtime.binding.kafka.internal.stream.KafkaClientConnectionPool$KafkaClientConnection.onConnectionWindow(KafkaClientConnectionPool.java:1574)
at io.aklivity.zilla.runtime.binding.kafka@0.9.60/io.aklivity.zilla.runtime.binding.kafka.internal.stream.KafkaClientConnectionPool$KafkaClientConnection.onConnectionMessage(KafkaClientConnectionPool.java:1383)
at io.aklivity.zilla.runtime.engine@0.9.60/io.aklivity.zilla.runtime.engine.internal.registry.DispatchAgent.handleReadInitial(DispatchAgent.java:1106)
at io.aklivity.zilla.runtime.engine@0.9.60/io.aklivity.zilla.runtime.engine.internal.registry.DispatchAgent.handleRead(DispatchAgent.java:1041)
at io.aklivity.zilla.runtime.engine@0.9.60/io.aklivity.zilla.runtime.engine.internal.concurent.ManyToOneRingBuffer.read(ManyToOneRingBuffer.java:181)
at io.aklivity.zilla.runtime.engine@0.9.60/io.aklivity.zilla.runtime.engine.internal.registry.DispatchAgent.doWork(DispatchAgent.java:701)
... 3 more
Suppressed: java.lang.Exception: [engine/data#3][0x03030000000005a5] streams=[consumeAt=0x0027b688 (0x000000000227b688), produceAt=0x00302eb8 (0x0000000002302eb8)]
at io.aklivity.zilla.runtime.engine@0.9.60/io.aklivity.zilla.runtime.engine.internal.registry.DispatchAgent.doWork(DispatchAgent.java:705)
... 3 more

To Reproduce
Steps to reproduce the behavior:

  1. Go to taxi-demo on the load-test branch
  2. Follow the demo instructions to start the demo
  3. Review the load testing instructions
  4. with replication set to 300, run the load_test.sh script 2-5 times or until Zilla throws an error.

Expected behavior
Zilla should handle this many clients

Additional context
Zilla logs recorded here

@vordimous vordimous added the bug Something isn't working label Jan 24, 2024
@akrambek
Copy link
Contributor

@vordimous is it consistently reproducible?

@vordimous
Copy link
Contributor Author

@akrambek yes it was for me.

@akrambek
Copy link
Contributor

Thanks then I will close #716 as it has the same stacktrace.

@akrambek
Copy link
Contributor

Blocked by #770

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants