Usage of direct buffers on FS2 Socket causes an unpredictable memory use on http4s with embers backend #3370

anesto-deltoro · 2024-01-11T14:09:06Z

This issue is a follow-up of http4s Issue-7343.

Recently while updating our services we bumped our dependencies to the latest versions of Typelevel Stack and replaced the http4s backend Blaze with Ember as suggested in the official documentation.

After the upgrade, we started receiving alerts that our services were being restarted from time to time.

We took a close look into metrics and noticed that the services were killed due to native memory usage: resident set size (RSS) reached the native memory limit, while heap memory was not close to the limit (see figure below). OOM exception was never recorded.

We analyzed the memory and thread dumps of the services and didn’t find any clear indication of a memory leak.

Based on our observations we noticed an odd behavior of the direct buffers, so we decided to compare the http4s backends blaze vs ember.

For each http4s backend, we defined a dummy http4s server with one simple endpoint (/ping → OK(”pong”) and we simulated a huge number of concurrent requests (1M total requests, thread pool size of 10). While doing so we recorded the Direct Buffer behavior via VisualVM (see figures below).

💡 As can be noticed blaze keeps a stable behavior while ember keeps increasing the DirectBuffer. If the DirectBuffers are not released by a Garbage Collector cycle fast enough (e.g. there is still enough heap memory available) this leads to process termination, as described above.

It is worth stressing that if we force a GC run the memory is released which explains why we couldn't find any hints of a memory leak during our analysis of the thread dumps. The restarts occur when the JVM heap is not under pressure, during which there is no GC cycle triggered, but the native memory reaches the limit due to the increasing growth of DirectBuffers.

We have no control over how our APIs are used by external clients, so we can't rely on/assume optimal use of connection pools.

We took a look into the PR that introduced the use of direct buffers, and as noted in the comments by @djspiewak the performance gains were marginal (almost no difference).

We created an internal fork of fs2 and replaced the use of direct buffers with indirect buffers and we confirmed that the memory is stable/predictable and our services are not restarting anymore.
We understand that the usage of direct buffers should offer a performance benefit, but the unpredictable behavior of memory use is probably a high price to pay, especially if it is traduced into service restarts on production.

Here is a GitHub repo to reproduce our experiments: ember-fs2-demo
Here are some details of the configuration we use

Kubernetes Pod configuration:
Limits:
memory: 2Gi
Requests:
memory: 2Gi
cpu: 500m

JAVA_OPTS:
-XX:+PrintFlagsFinal
-Djava.rmi.server.hostname=127.0.0.1
-server
-Xms1340M
-Xmx1340M
-XX:MaxMetaspaceSize=256M
-XX:MetaspaceSize=256M
-XX:ReservedCodeCacheSize=240M
-Xss1M
-XX:MaxGCPauseMillis=100
-XX:+UseG1GC

Versions:

java version: openjdk-11.0.14-jdk-slim

Ember dummy project versions:
cats-effect:3.5.2
http4s-ember-server:0.23.24
fs2:3.9.3

Blaze dummy project versions:
http4s-blaze-server:0.23.15

djspiewak · 2024-01-11T14:14:01Z

Wow this is an absolute wealth of investigative data, thank you.

mpilquist · 2024-01-11T15:00:50Z

Agreed, fantastic work!

Thoughts on switching entirely to indirect buffers versus making the buffer type configurable (and default to indirect)?

armanbilge · 2024-01-11T20:17:36Z

So for the record, the JDK does fast-path direct buffers. Of course this doesn't mean the performance difference is measurable or that the trade-off is worth it wrt predictability of memory use.

https://github.com/openjdk/jdk/blob/93bedd7abae33f5d5eb909d3d216ee415ad2f8b2/src/java.base/share/classes/sun/nio/ch/IOUtil.java#L286-L294

Other projects such as Netty, Lucene, Hadoop use sun.misc.Unsafe.invokeCleaner to manually manage the lifecycle of direct ByteBuffers.

https://github.com/netty/netty/blob/7ad4c4b0795b71f46577cb1bb4c6f4244e209a89/common/src/main/java/io/netty/util/internal/CleanerJava9.java#L46

https://github.com/apache/lucene/blob/8bee41880e41024109bf5729584ebc5dd1003717/lucene/core/src/java/org/apache/lucene/store/MappedByteBufferIndexInputProvider.java#L167

https://github.com/apache/hadoop/blob/2f1e1558b6fc3b15fc753d029212268699ad0ae3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/CleanerUtil.java#L97-L98

We've generally avoided reaching for hacks like that in CE/FS2. However, it would provide us a mechanism to safely use direct ByteBuffers, and in the case we are unable to use those internal APIs we could fallback to using indirect ByteBuffers.

danicheg · 2024-01-12T05:12:37Z

Perhaps changing GC from G1 to Shenandoah also could be helpful here. Under jdk11, it's available since 11.0.9, so you should be able to try it out.

mpilquist · 2024-01-30T13:06:15Z

Fixed in #3371

anesto-deltoro added the bug label Jan 11, 2024

armanbilge mentioned this issue Jan 11, 2024

Native memory overuse by DirectBuffers when using the ember server http4s/http4s#7343

Closed

mpilquist mentioned this issue Jan 13, 2024

Use nondirect buffer instead of direct buffer #3371

Merged

mpilquist closed this as completed Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage of direct buffers on FS2 Socket causes an unpredictable memory use on http4s with embers backend #3370

Usage of direct buffers on FS2 Socket causes an unpredictable memory use on http4s with embers backend #3370

anesto-deltoro commented Jan 11, 2024

djspiewak commented Jan 11, 2024

mpilquist commented Jan 11, 2024

armanbilge commented Jan 11, 2024

danicheg commented Jan 12, 2024

mpilquist commented Jan 30, 2024

Usage of direct buffers on FS2 Socket causes an unpredictable memory use on http4s with embers backend #3370

Usage of direct buffers on FS2 Socket causes an unpredictable memory use on http4s with embers backend #3370

Comments

anesto-deltoro commented Jan 11, 2024

djspiewak commented Jan 11, 2024

mpilquist commented Jan 11, 2024

armanbilge commented Jan 11, 2024

danicheg commented Jan 12, 2024

mpilquist commented Jan 30, 2024