Skip to content

Performance benchmarks

js-labs edited this page Sep 23, 2014 · 1 revision

Last few years CPU development trend is to increase number of cores, but not the performance of a single core. Applications requiring good performance can not rely on the efficient code only, the code should be scalable. The main tasks of the network applications usually are:

  • read data from socket
  • process data
  • send data (optionally)

One feature of the Java is that socket operation is quite expensive from the performance point of view. It would be nice if we could scale these operations, and the JS-Collider framework is such attempt. Even one TCP/IP session is scaled up to 3 threads: one thread reads socket channel, anonther thread process data, one more thread writes socket channel.

Here is a execution flow of the common network applications approach:

working thread   +-------------+   +-------------+   +--------------+   +--------------+   +--------------+
-----------------| wait socket |-->| read socket |-->| process data |-->| read socket  |-->| process data |--->
                 +-------------+   +-------------+   +--------------+   +--------------+   +--------------+

and JS-Collider:

working thread 1 +-------------+   +-------------+   +--------------+   +--------------+   +-------------+
-----------------| wait socket |-->| read socket |-+>| process data |-+>| process data |-->|             |--->
                 +-------------+   +-------------+ | +--------------+ | +--------------+   |             |
                                                   |                  |                    |   PROFIT    |
working thread 2                                   |  +-------------+ |                    |             |
---------------------------------------------------+->| read socket |-+------------------->|             |--->
                                                      +-------------+                      +-------------+

It gives an opportunity to the data processor start handling next received data block as soon as it finished with a previous one, without spending a time reading socket, but usually does not work well because of threads synchronization. The main distinction of JS-Collider framework is that it widely use atomic states and especially designed lock free containers reducing the synchronization price to the atomic CAS operation. Also there is a specially tuned executor having better latency, throughput and without any object allocation comparing to the JSR-166 executor implementation.

Compare performance of the JS-Collider framework with probably most popular open source Java network framework (guess it :) on the following hardware:

  • Intel Core i7 920 2.6 Ghz, 4 cores, HT (8 logical processors)
  • 6Gb memory
  • Windows 7

Echo throughput

Client connects to the server and sends 100000 messages (500 bytes length) in a batch, receiving messages in the same time. Server replies message by message back to the client. Client measure time from the first sent message till the last received message (time in seconds). Tested with a different number of concurrently running sessions.

Broadcast throughput

Test pattern is typical for applications like multiplayer game server. Each message received from client entails message to all connected clients, each client sends 100000 messages (500 bytes length) and receives back 100000*(NUMBER_OF_CLIENTS) messages. Test work time is measured from the very first message sent by clients till the last message received by last client. Tested with a different numbers of concurrently running sessions.

Running another framework test with 16 sessions gives the following:

#
[thread 17772 also had an error]
[thread 27828 also had an error]
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000000000241f94c, pid=24264, tid=21816
#
# JRE version: 7.0_21-b11
# Java VM: Java HotSpot(TM) 64-Bit Server VM (23.21-b01 mixed mode wndows-amd64 compressed oops)
# Problematic frame:
# J  sun.nio.ch.IOUtil.write(Ljava/io/FileDescriptor;[Ljava/nio/Byteuffer;IILsun/nio/ch/NativeDispatcher;)J
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#

Anything else?

Core of any messaging framework is communication. Let's try to compare performance with a really fast messaging framework: ZeroMQ (4.0.4). It has a limited set of hardcoded usage patterns, we are going to compare PUB-SUB pattern running a test with one publisher and different number of subscribers. ZeroMQ test is implemented in C++, optimized build. Test send 100000 messages 100 bytes length each, work time is measured from the moment when the publisher send the first message till the moment when last subscriber receives last message.

Wow! Java works not worse than C++ code. Collider implementation required a bit more coding than ZeroMQ because it has asynchronous nature and it's API works with a raw data.

Clone this wiki locally