-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage of direct buffers on FS2 Socket causes an unpredictable memory use on http4s with embers backend #3370
Comments
Wow this is an absolute wealth of investigative data, thank you. |
Agreed, fantastic work! Thoughts on switching entirely to indirect buffers versus making the buffer type configurable (and default to indirect)? |
So for the record, the JDK does fast-path direct buffers. Of course this doesn't mean the performance difference is measurable or that the trade-off is worth it wrt predictability of memory use. Other projects such as Netty, Lucene, Hadoop use We've generally avoided reaching for hacks like that in CE/FS2. However, it would provide us a mechanism to safely use direct |
Perhaps changing GC from G1 to Shenandoah also could be helpful here. Under jdk11, it's available since 11.0.9, so you should be able to try it out. |
Fixed in #3371 |
This issue is a follow-up of http4s Issue-7343.
Recently while updating our services we bumped our dependencies to the latest versions of Typelevel Stack and replaced the http4s backend Blaze with Ember as suggested in the official documentation.
After the upgrade, we started receiving alerts that our services were being restarted from time to time.
We took a close look into metrics and noticed that the services were killed due to native memory usage: resident set size (RSS) reached the native memory limit, while heap memory was not close to the limit (see figure below). OOM exception was never recorded.
We analyzed the memory and thread dumps of the services and didn’t find any clear indication of a memory leak.
Based on our observations we noticed an odd behavior of the direct buffers, so we decided to compare the http4s backends blaze vs ember.
For each http4s backend, we defined a dummy http4s server with one simple endpoint (/ping → OK(”pong”) and we simulated a huge number of concurrent requests (1M total requests, thread pool size of 10). While doing so we recorded the Direct Buffer behavior via VisualVM (see figures below).
💡 As can be noticed blaze keeps a stable behavior while ember keeps increasing the DirectBuffer. If the DirectBuffers are not released by a Garbage Collector cycle fast enough (e.g. there is still enough heap memory available) this leads to process termination, as described above.
It is worth stressing that if we force a GC run the memory is released which explains why we couldn't find any hints of a memory leak during our analysis of the thread dumps. The restarts occur when the JVM heap is not under pressure, during which there is no GC cycle triggered, but the native memory reaches the limit due to the increasing growth of DirectBuffers.
We have no control over how our APIs are used by external clients, so we can't rely on/assume optimal use of connection pools.
We took a look into the PR that introduced the use of direct buffers, and as noted in the comments by @djspiewak the performance gains were marginal (almost no difference).
We created an internal fork of fs2 and replaced the use of direct buffers with indirect buffers and we confirmed that the memory is stable/predictable and our services are not restarting anymore.
We understand that the usage of direct buffers should offer a performance benefit, but the unpredictable behavior of memory use is probably a high price to pay, especially if it is traduced into service restarts on production.
Here is a GitHub repo to reproduce our experiments: ember-fs2-demo
Here are some details of the configuration we use
Kubernetes Pod configuration:
Limits:
memory: 2Gi
Requests:
memory: 2Gi
cpu: 500m
JAVA_OPTS:
-XX:+PrintFlagsFinal
-Djava.rmi.server.hostname=127.0.0.1
-server
-Xms1340M
-Xmx1340M
-XX:MaxMetaspaceSize=256M
-XX:MetaspaceSize=256M
-XX:ReservedCodeCacheSize=240M
-Xss1M
-XX:MaxGCPauseMillis=100
-XX:+UseG1GC
Versions:
java version: openjdk-11.0.14-jdk-slim
Ember dummy project versions:
cats-effect:3.5.2
http4s-ember-server:0.23.24
fs2:3.9.3
Blaze dummy project versions:
http4s-blaze-server:0.23.15
The text was updated successfully, but these errors were encountered: