Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for virtual threads (Java 21) #9534

Open
trask opened this issue Sep 21, 2023 · 2 comments
Open

Support for virtual threads (Java 21) #9534

trask opened this issue Sep 21, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@trask
Copy link
Member

trask commented Sep 21, 2023

Virtual threads are out of preview and included in new Java 21 (LTS) release (https://openjdk.org/jeps/444).

Notes/references from weekly Java meeting:

@JonasKunz
Copy link
Contributor

JonasKunz commented Sep 22, 2023

Here's my understanding on this topic, I've implemented the virtual thread support in the elastic apm agent.

Thread Locals

The usage of thread locals prior to virtual threads can be roughly divided into two categories:

  • A ThreadLocalis used as a way of implicitly passing values up or down the callstack. E.g. the OpenTelemetry Context is a prime example for this usecase.
  • A ThreadLocal is used to pool an expensive, non-thread safe resource.

The first use case is still fine with virtual threads: It is just a necessary tool and the lifetime of the thread local is usually limited (e.g. via the closeable Scope).

The second use case is the one which can be problematic. A good example is the TemporaryBuffers class from the SDK.
The buffer is allocated on first usage in a Thread and from then on lives until the thread dies.

With non-virtual threads this was super nice, especially because your typical web application would use a thread-pool of worker threads. After every thread in the pool has been used, TemporaryBuffers would be allocation free because the ThreadLocal has been initialized for every pooled thread.

With virtual threads the intended usage would be to not pool threads, but create a new virtual thread for each request.
This means that for each request a new TemporaryBuffer is allocated and first usage. It could only be reused on further usage within the same request (=the same virtual thread). If your requests are short lived, this means that the pooling is somewhat ineffective but the memory consumption would still be okay.

Where things get really bad is if the ThreadLocal unnecessarily prolongs the lifetime of the pooled object.
The best example here imo is a Java service uses as HTTP Gateway/Proxy. Previously you would typically use reactive programming here, whereas with virtual threads you can now easily go back to the thread-per-request model.

Imagine an HTTP Gateway which just forwards the HTTP requests to other services which take 10s on avergage.
With 10k requests per second, this means that at any given time 100k requests are served concurrently, equating to 100k virtual threads mostly waiting on the response of the downstream HTTP service.
If every virtual thread holds on to a 4kb TemporaryBuffer, this gives you about 400mb wasted heap memory.

Alternatives

  • Give up on pooling and live with the "expensive" resource creation (e.g. higher allocation rate)
    • In the elastic agent we did this for some char[] pools which were only used once per handled HTTP request IIRC
  • Use a true Object-pool instead of shoehorning ThreadLocals to do this. This is not free though, as the thread-safe management usually requires atomic operations when acquiring / releasing Objects.
    • In the elastic agent we use an object pool based on the MpmcAtomicArrayQueue from JCTools
  • Stay with ThreadLocal pooling if the pooled resource is very lightweight AND used very frequently, even during a single request
    • In the elastic apm agent we use a special WeakConcurrentMap (GH). That map requires a special key object for lookups with overidden hashcode() and equals(). In order to not always having to allocate those, this special key object is stored in a ThreadLocal and reused for every get() on the map. Because that object is both very small and used very heavily, we stayed with a ThreadLocal in this case.
  • Caution, hack: The JVM seems to have an internal "carrier thread local" (see this discussion on the jackson GH repo). This would be super dangerous to use as you would have to guarantee that your virtual thread is not unmounted while you are holding onto the pooled resource. In the future this might be even impossible if they add preemption to virtual threads.

Suggestion for next steps

In my opinion it would make sense to go through both the Otel Agent and the SDK and to look at all usages of ThreadLocals.

  • If they are used for passing implict parameter and have a short lifetime: great, nothing todo!
  • If they are used for pooling decide which of the proposed alternatives would be the best fit

We (elastic) can definitely contribute the object pooling implementation and it's JMH benchmark suite to Otel if desired. We've been using this pooling implementation for a few years now to minimize allocation rates prior to virtual threads, so it is battle tested.

Executors

The newly added Executors are Executors.newVirtualThreadPerTaskExecutor() and Executors.newThreadPerTaskExecutor(ThreadFactory). I think the former builds on the latter. Chances are high that our instrumentation already covers these, so maybe just adding tests for those should be good enough. This is at least how it was for the elastic agent.

@JonasKunz
Copy link
Contributor

JonasKunz commented Oct 5, 2023

I looked for all usages of ThreadLocal in the API, SDK and Agent and tried to classify for which usages action might be required for handling virtual threads nicely.

Usages might needing work

API: TemporaryBuffers

  • All except one usages acquire a very small buffer (<64 chars, so <128 byte)
  • The only "large" buffer is allocated by PercentEscaper (1000 chars, so 2000 bytes)
  • Proposed Solution:
    • Stay with thread locals for <64 chars for performance
    • Use a char[] object pool for PercentEscaper. The two atomic operations for pool management shouldn't have too much overhead in relation to the percent escaping logic. It is only on a slow path anyway
    • Not too urgent for now, as this code path is only entered when actually requiring encoding (e.g. special characters in propagated baggage)

SDK: JdkHttpSender

  • ThreadLocal<ByteBufferPool> is very heavy-weight
  • Should be okay though, because this code path is only executed for a private thread-pool with five threads
  • We would only need to adapt this if we plan to replace that thread-pool with virtual threads at some time, because then the pooling will become ineffective (but no memory leak due to prolonged lifetime)

SDK: ProtoSerializer

For the agent I think no changes are needed, all usages seem fine to me!

Usages looking fine to me

I checked the following usages across API, SDK and Agent but classified them as non-problematic for virtual threads:

API:

SDK:

  • WeakConcurrentMap: ThreadLocal<LookupKey<?>>
    • Expected to be used very frequently, so an object pool would be bad for performance imo
    • Object is very small, so okay to have as "deadweight" even for virtual threads

Agent:

TL;DR

  • All ThreadLocal usages in the agent are fine
  • Under the assumption that exporters are not using virtual threads we are good in the SDK
  • For the API we only need to look at the TemporaryBuffers usage by PercentEscaper.
    • Imo it would be worth replacing this usage with an object-pool instead
    • Not too urgent, as excessive memory usage would be an edge case even with lots of virtual threads

Some Feedback on whether you agree / disagree would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants