Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage of direct buffers on FS2 Socket causes an unpredictable memory use on http4s with embers backend #3370

Closed
anesto-deltoro opened this issue Jan 11, 2024 · 5 comments · Fixed by #3371
Labels

Comments

@anesto-deltoro
Copy link

This issue is a follow-up of http4s Issue-7343.

Recently while updating our services we bumped our dependencies to the latest versions of Typelevel Stack and replaced the http4s backend Blaze with Ember as suggested in the official documentation.

After the upgrade, we started receiving alerts that our services were being restarted from time to time.

We took a close look into metrics and noticed that the services were killed due to native memory usage: resident set size (RSS) reached the native memory limit, while heap memory was not close to the limit (see figure below). OOM exception was never recorded.
image
We analyzed the memory and thread dumps of the services and didn’t find any clear indication of a memory leak.

Based on our observations we noticed an odd behavior of the direct buffers, so we decided to compare the http4s backends blaze vs ember.

For each http4s backend, we defined a dummy http4s server with one simple endpoint (/ping → OK(”pong”) and we simulated a huge number of concurrent requests (1M total requests, thread pool size of 10). While doing so we recorded the Direct Buffer behavior via VisualVM (see figures below).
image
image

💡 As can be noticed blaze keeps a stable behavior while ember keeps increasing the DirectBuffer. If the DirectBuffers are not released by a Garbage Collector cycle fast enough (e.g. there is still enough heap memory available) this leads to process termination, as described above.

It is worth stressing that if we force a GC run the memory is released which explains why we couldn't find any hints of a memory leak during our analysis of the thread dumps. The restarts occur when the JVM heap is not under pressure, during which there is no GC cycle triggered, but the native memory reaches the limit due to the increasing growth of DirectBuffers.

We have no control over how our APIs are used by external clients, so we can't rely on/assume optimal use of connection pools.

We took a look into the PR that introduced the use of direct buffers, and as noted in the comments by @djspiewak the performance gains were marginal (almost no difference).

We created an internal fork of fs2 and replaced the use of direct buffers with indirect buffers and we confirmed that the memory is stable/predictable and our services are not restarting anymore.
We understand that the usage of direct buffers should offer a performance benefit, but the unpredictable behavior of memory use is probably a high price to pay, especially if it is traduced into service restarts on production.

Here is a GitHub repo to reproduce our experiments: ember-fs2-demo
Here are some details of the configuration we use

Kubernetes Pod configuration:
Limits:
memory: 2Gi
Requests:
memory: 2Gi
cpu: 500m

JAVA_OPTS:
-XX:+PrintFlagsFinal
-Djava.rmi.server.hostname=127.0.0.1
-server
-Xms1340M
-Xmx1340M
-XX:MaxMetaspaceSize=256M
-XX:MetaspaceSize=256M
-XX:ReservedCodeCacheSize=240M
-Xss1M
-XX:MaxGCPauseMillis=100
-XX:+UseG1GC

Versions:

java version: openjdk-11.0.14-jdk-slim

Ember dummy project versions:
cats-effect:3.5.2
http4s-ember-server:0.23.24
fs2:3.9.3

Blaze dummy project versions:
http4s-blaze-server:0.23.15

@djspiewak
Copy link
Member

Wow this is an absolute wealth of investigative data, thank you.

@mpilquist
Copy link
Member

Agreed, fantastic work!

Thoughts on switching entirely to indirect buffers versus making the buffer type configurable (and default to indirect)?

@armanbilge
Copy link
Member

So for the record, the JDK does fast-path direct buffers. Of course this doesn't mean the performance difference is measurable or that the trade-off is worth it wrt predictability of memory use.

https://github.com/openjdk/jdk/blob/93bedd7abae33f5d5eb909d3d216ee415ad2f8b2/src/java.base/share/classes/sun/nio/ch/IOUtil.java#L286-L294


Other projects such as Netty, Lucene, Hadoop use sun.misc.Unsafe.invokeCleaner to manually manage the lifecycle of direct ByteBuffers.

https://github.com/netty/netty/blob/7ad4c4b0795b71f46577cb1bb4c6f4244e209a89/common/src/main/java/io/netty/util/internal/CleanerJava9.java#L46

https://github.com/apache/lucene/blob/8bee41880e41024109bf5729584ebc5dd1003717/lucene/core/src/java/org/apache/lucene/store/MappedByteBufferIndexInputProvider.java#L167

https://github.com/apache/hadoop/blob/2f1e1558b6fc3b15fc753d029212268699ad0ae3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/CleanerUtil.java#L97-L98

We've generally avoided reaching for hacks like that in CE/FS2. However, it would provide us a mechanism to safely use direct ByteBuffers, and in the case we are unable to use those internal APIs we could fallback to using indirect ByteBuffers.

@danicheg
Copy link
Member

Perhaps changing GC from G1 to Shenandoah also could be helpful here. Under jdk11, it's available since 11.0.9, so you should be able to try it out.

@mpilquist
Copy link
Member

Fixed in #3371

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants