Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

architecture revamp: improved performance, nonblocking queue... #94

Merged
merged 35 commits into from
Mar 6, 2020

Conversation

truthbk
Copy link
Member

@truthbk truthbk commented Dec 9, 2019

The goal of this PR is to improve client performance by removing client blocking calls, replacing them with non-blocking variants and letting the caller deal with failure.

Benchmark

A full-blast performance test has been added to try to understand the performance gains yielded by the ConcurrentLinkedList vs LinkedBlockingQueue (current implementation). The test is very simple (see: 3dddaef), using 10 threads to submit metrics in a busy loop with the statsd client. On the other end a dummy statsd server.

The following numbers have been obtained on a late 2015 13-inch Macbook Pro (3.1GHz i7 and 16GB RAM).

1024 element queue:

  • ConcurrentLinkedList
Running com.timgroup.statsd.NonBlockingStatsDClientMaxPerfTest
Dec 10, 2019 6:30:10 PM com.timgroup.statsd.NonBlockingStatsDClientMaxPerfTest perf_test
INFO: Messages at server: 12470160
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.159 sec
  • LinkedBlockingQueue
Running com.timgroup.statsd.NonBlockingStatsDClientMaxPerfTest
Dec 10, 2019 6:26:36 PM com.timgroup.statsd.NonBlockingStatsDClientMaxPerfTest perf_test
INFO: Messages at server: 1670500
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.128 sec

2048 element queue:

  • ConcurrentLinkedList
Running com.timgroup.statsd.NonBlockingStatsDClientMaxPerfTest
Dec 10, 2019 6:20:38 PM com.timgroup.statsd.NonBlockingStatsDClientMaxPerfTest perf_test
INFO: Messages at server: 11791287
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.195 sec
  • LinkedBlockingQueue
Running com.timgroup.statsd.NonBlockingStatsDClientMaxPerfTest
Dec 10, 2019 6:23:07 PM com.timgroup.statsd.NonBlockingStatsDClientMaxPerfTest perf_test
INFO: Messages at server: 2399716
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.095 sec

4096 element queue:

  • ConcurrentLinkedList
Running com.timgroup.statsd.NonBlockingStatsDClientMaxPerfTest
Dec 10, 2019 6:38:35 PM com.timgroup.statsd.NonBlockingStatsDClientMaxPerfTest perf_test
INFO: Messages at server: 11745778
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.178 sec
  • LinkedBlockingQueue
Running com.timgroup.statsd.NonBlockingStatsDClientMaxPerfTest
Dec 10, 2019 6:39:58 PM com.timgroup.statsd.NonBlockingStatsDClientMaxPerfTest perf_test
INFO: Messages at server: 2606678
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.09 sec

Integer.MAX_VALUE queue

  • ConcurrentLinkedList
Running com.timgroup.statsd.NonBlockingStatsDClientMaxPerfTest
Dec 10, 2019 6:46:40 PM com.timgroup.statsd.NonBlockingStatsDClientMaxPerfTest perf_test
INFO: Messages at server: 1566180
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 51.564 sec
  • LinkedBlockingQueue
Running com.timgroup.statsd.NonBlockingStatsDClientMaxPerfTest
Dec 10, 2019 6:44:54 PM com.timgroup.statsd.NonBlockingStatsDClientMaxPerfTest perf_test
INFO: Messages at server: 7216704
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 58.014 sec

Conclusions

While there are some very clear gains using the ConcurrentLinkedList, particularly when comparing the best performance achieved, the diminishing returns with larger queue sizes for the ConcurrentLinkedList scenarios is remarkable and hard to explain. Particularly given that the changes made to the message building and actual channel(socket) writing logic are minimal.

While the LinkedBlockingQueue scenario steadily provides increased performance with larger queues, we have the reverse behavior for our ConcurrentLinkedList. Note that due to implementation details there are a few operations in the ConcurrentLinkedList which are not, as one would expect, constant time O(1) (for instance, size()). However, we tried to steer clear from these O(n) methods, so it's currently challenging to explain the difference in performance and the issues we encountered.

(to be continued....)

Copy link
Contributor

@ogaca-dd ogaca-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an interesting comment in ConcurrentLinkedList that should explain the behavior you noticed.

A very simple linked list that supports concurrent access. Internally, it uses immutable objects. It uses recursion and is not meant for long lists.

pom.xml Outdated
@@ -105,7 +105,7 @@
<configuration>
<rules>
<requireJavaVersion>
<version>[1.7.0-0,1.9.0-0)</version>
<version>[1.7.0-0,1.9.0-0),[9.0.0,11.0.5]</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running mvn install -Dgpg.skip with openjdk 12.0.2 I get:

INFO] --- maven-enforcer-plugin:3.0.0-M2:enforce (validate) @ java-dogstatsd-client ---
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireJavaVersion failed with message:
Detected JDK Version: 12.0.2 is not in the allowed range [1.7.0-0,1.9.0-0),[9.0.0,11.0.5].
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  0.956 s
[INFO] Finished at: 2020-01-13T17:12:17+01:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce (validate) on project java-dogstatsd-client: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed.

If I replace [9.0.0,11.0.5] by [9.0.0,12.0.5], mvn install -Dgpg.skip succeed.

Note: mvn install -Dgpg.skip succeed on master branch without any changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's because Java12 was not supported java version. I can bump to Java12, but it shouldn't be a problem as we'll be building on Java8 most likely targeting Java7, so I'm not too concerned, but I'll bump this. If it works on Java12 no reason not to support it.

queue.offer(message);
return true;
if (qSize.get() < qCapacity) {
queue.offer(message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature for offer is boolean java.util.Queue.offer(String e). I do not know if the function can return false for ConcurrentLinkedQueue but if it is possible, qSize.get() may not be correct. This could lead to a situation where the queue is empty but qSize.get() >= qCapacity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that qCapacity is not a fully accurate picture. I resorted to this "soft guide" to set bounds on the queue size, but exceeding the bounds somewhat wasn't a problem. Since for every push there will be a pop, it didn't seem like much of a problem. We also cannot pop if there isn't an element in the queue, so the value of qSize could never be negative. We could however exceed the value for qCapacity in which case we'd drop so the value would be guaranteed to come back down as we pop() in the processing routine. I guess what I'm trying to say is that even though the value is not accurate, it still allows us to set a vaguely accurate bound on the queue size.

Finally, offer() cannot return false for a ConcurrentLinkedQueue. From the javadoc:

As the queue is unbounded, this method will never return false.

Let me know if you see any holes in my reasoning.

import org.junit.BeforeClass;
import org.junit.Test;

import static org.junit.Assert.assertNotEquals;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I had some issue compiling the code with this line.

* [circleci] no checkstyle on java7

* [travis] get off travis
@truthbk truthbk changed the title [sender] improve performance - use nonblocking queue - WIP [sender] improve performance - use nonblocking queue Feb 5, 2020
Copy link
Contributor

@ogaca-dd ogaca-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I did a quick review of all changes and did not see anything suspicious.

return;
}

final String message = messages.poll(WAIT_SLEEP_MS, TimeUnit.MILLISECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As several threads can poll messages, if an application call send("foo") then send("bar"), the buffer containing bar can be queued before the one containing foo.
There is no guaranty about the order of the messages.
Is it a problem? I am thinking about the case of receiving an Event stop before an Event start.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed what you say is accurate, I had a technical discussion about this with @olivielpeau, and we don't think it's a problem, at least not in any obvious manner. I also feel that even though it's definitely true, in practice it won't matter much because the order issue would typically affect metrics submitted in a very limited timeframe. In any case, this was one of the things we had to keep track of specifically during QA.

@truthbk truthbk changed the title [sender] improve performance - use nonblocking queue [sender] architectural revamp: improved performance, nonblocking queue... Mar 6, 2020
@truthbk truthbk changed the title [sender] architectural revamp: improved performance, nonblocking queue... architecture revamp: improved performance, nonblocking queue... Mar 6, 2020
@truthbk truthbk merged commit ad47973 into master Mar 6, 2020
@truthbk truthbk added this to the 2.10.0 milestone May 5, 2020
dbyron0 pushed a commit to locationlabs/java-dogstatsd-client that referenced this pull request Jan 11, 2021
…Dog#94)

* [sender] use concurrent linked queue - non-blocking - to message enqueueing

* [tests] fix integration testing

* comment

* [sender] sleep less

* [sender] decrement when taking message from queue

* [test] add max throughput performance test

* [travis] actually do run tests

* [test] fix test teardown

* sender: we should only need to get the address once

* parametrize test workers

* Adding buffer pool and multi-worker sender

* [buffer] fixes

* [test] max_perf: default q-size 1024

* improve interruption handling behavior + collateral

* [client] adding more constructors for woker configurability

* [test] make max perf test parametric

* [maven] add surefire plugin + heap args

* [test] make max performance test less memory intensive + remove unbounded queue test

* [maven] bump surefire

* abstract class and blocking+nonblocking processor

* better construction + class abstractions

* [test] fix them - less is more

* [processor] multi-worker support

* blocking processor: remove unused imports

* [test] we can reuse port, support more scenarios

* [maven] add checkstyle

* [maven] add checkstyle

* [checkstyle] fix all checkstyle offenses

* [javadoc] fix issues with javadoc

* [mvn] silence illegal access warning; not real.

* [travis] attempt to checkstyle with a different version on Java7

* removing unused imports.

* [mvn] support up to Java 13.0.2.

* [circleci] add openjdk13 to test matrix

* [circleci] no checkstyle on java7 + remove travis (DataDog#100)

* [circleci] no checkstyle on java7

* [travis] get off travis
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants