Add virtual threads support #224

kawamuray · 2024-01-17T11:21:46Z

This PR aims to introduce Virtual Thread support, which we expect to have very high affinity with a typical workload implemented on Decaton (I/O heavy).
We are providing DeferredCompletion (async processing) to support DecatonProcessor implementation leveraging asynchronous paradigm, but it has been causing completion leak and then consumption stuck, where the virtual thread could be a perfect solution to eliminate the whole problem.

As this involves a lot changes in existing code base, here are the (likely) complete list of changes made out other than the newly added virtual thread support:

Logback version upgrade - necessary to cope with vthread
Gradle upgrade - necessary to build/run with java 21 or higher
Use of synchronized => ReentrantLock - vthread and synchronized does not work well (pinning)
Merge TaskMetrics and ProcessMetrics as PerPartitionMetrics for better organization
Deprecate AsyncShutdownable interface and added AsyncClosable for refactoring
PartitionProcessor is now called SubPartitions which is an interface with possible many implementations. Not intending to let users implement this interface though.
Add --latency-count option for benchmark. This is for simulating multiple IO during processing a task to count context switching costs in.
Benchmark's DecatonProcessor scope has been changed from THREAD to PROVIDED for fair result on VIRTUAL_THREAD runtime
Benchmark run now runs with fixed heap size (8gb), -server option and -Xcomp option to minimize impact from JIT compilation
Change async-profiler to be applied twice for benchmark, this is to avoid the real run measurement interfered by deoptimization caused by loading JVMTI agent
Remove any micrometer meters instantiations in ProcessorUnit, as it found to be very expensive by profiling

About the performance, I ran several benchmark using the benchmark module.
The benchmark simulates workload with 5 I/Os (5 context switches forced at least), with varying I/O latency (off-cpu).

First, to compare the maximum performance producable from current THREAD_POOL runtime, I tried to find the optimal count to set for decaton.partition.concurrency.

Command:

    ./debm.sh \
        --title "Decaton" \
        --runner com.linecorp.decaton.benchmark.DecatonRunner \
        --runs 2 \
        --format json \
        --tasks 10000 \
        --warmup 100000 \
        --simulate-latency=4 \
        --latency-count=5 \
        --param=decaton.partition.concurrency=$conc \
        --param=decaton.subpartition.runtime=THREAD_POOL | tee vthread-bm/threadpool-conc-$conc.json

Result:

Found out decaton.partition.concurrency=300 is the setting of peak performance.

Then I ran the same workload with increasing I/O latency against the THREAD_POOL mode and VIRTUAL_THREAD mode.
Command:

    ./debm.sh \
        --title "Decaton" \
        --runner com.linecorp.decaton.benchmark.DecatonRunner \
        --runs 3 \
        --format json \
        --tasks 10000 \
        --warmup 100000 \
        --simulate-latency=$latency \
        --latency-count=5 \
        --param=decaton.partition.concurrency=300 \
        --param=decaton.subpartition.runtime={THREAD_POOL|VIRTUAL_THREAD}

Result:

As the above charts shows, THREAD_POOL runtime decreases performance as the total I/O latency increases, but with the VIRTUAL_THREAD runtime throughput remains fairly stable and mitigates impact from I/O latency.

This reverts commit b03e678478cf4600add4f941519a308200bb7e64.

ocadaruma · 2024-01-20T01:12:00Z

@kawamuray Should I start review while the PR is still in draft state?

kawamuray · 2024-01-22T09:59:14Z

@ocadaruma yes please, from rough design discussions perhaps.

ocadaruma

Left only nits comment.
I didn't review test-code and benchmark's code in detail but overall looks good!

Can't wait for releasing this feature!

One concern is CI on jdk 8/11/17.
IMO we still should run tests on these versions because we have many users still run these.

After this PR got merged, I plan to submit a follow-up PR to bring back 8/11/17 test env

ocadaruma · 2024-01-23T17:04:00Z

testing/src/main/java/com/linecorp/decaton/testing/ConcurrentHashSet.java

+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+
+public class ConcurrentHashSet<E> implements Set<E> {


Why not using ConcurrentHashMap.newKeySet ?

Oh! I didn't know that API existed, good to know, will fix that.

…imization later

…fic metrics

ocadaruma

Apart from comments, AveragingRateLimiter still uses synchronized. We should get rid of it too.

ocadaruma · 2024-02-16T23:30:17Z

processor/src/it/java/com/linecorp/decaton/processor/VThreadCoreFunctionalityTest.java

+                    Thread.sleep(rand.nextInt(10));
+                }))
+                .propertySupplier(StaticPropertySupplier.of(
+                        Property.ofStatic(ProcessorProperties.CONFIG_PARTITION_CONCURRENCY, 16)


In VThreadCoreFunctionalityTest, no point to specify partition concurrency right?

Good catch, will remove that.

ocadaruma · 2024-02-17T05:44:21Z

processor/src/main/java/com/linecorp/decaton/processor/metrics/Metrics.java

-                                 .publishPercentiles(0.5, 0.9, 0.99, 0.999)
-                                 .register(registry));
-
+    public class PerPartitionMetrics extends AbstractMetrics {


+1 for merging TaskMetrcis and ProcessMetrics but PerPartitionMetrics sounds confusing because it sounds like the class containing all partitionScope metrics despite it's not.
Since all metrics inside this class is about tasks, how about TaskMetrics ?

ocadaruma · 2024-02-17T05:51:39Z

...sor/src/main/java/com/linecorp/decaton/processor/runtime/internal/AbstractSubPartitions.java

+    protected final PerPartitionMetrics perPartitionMetrics;
+    protected final SchedulerMetrics schedulerMetrics;
+
+    public AbstractSubPartitions(PartitionScope scope, Processors<?> processors) {


[nits] Let's use protected since this is an abstract class

ocadaruma · 2024-02-17T05:54:46Z

...sor/src/main/java/com/linecorp/decaton/processor/runtime/internal/AbstractSubPartitions.java

+        try {
+            return unit.asyncClose()
+                       .thenApply(ignored -> null) // To migrate type from Void to Object
+                       .completeOnTimeout(TIMEOUT_INDICATOR,


completeOnTimeout is introduced in Java9 so shouldn't be used

Ogh, good catch. What to do to make this part doesn't look messed up then ...

ocadaruma · 2024-02-17T06:04:23Z

processor/src/main/java/com/linecorp/decaton/processor/runtime/internal/ProcessorUnit.java

@@ -68,36 +65,33 @@ private void processTask(TaskRequest request) {
            return;
        }

-        Timer timer = Utils.timer();
+        CompletionStage<Void> processCompletion = CompletableFuture.completedFuture(null);


Might not be a problem but this creates unnecessary CompletableFuture.completedFuture instance for every task even when a task doesn't end up with exception.

Let's supply completedFuture inside catch block?

Good point!

ocadaruma · 2024-02-17T06:07:45Z

...src/test/java/com/linecorp/decaton/processor/runtime/internal/AbstractSubPartitionsTest.java

-    private Processors<?> processors;
-
-    @Test
-    public void testCleanupPartiallyInitializedUnits() throws Exception {


Why this test is deleted?

Because we no longer initialize processor units and instead do it lazily: https://github.com/line/decaton/pull/224/files#diff-c68a7fb9de5afb7968afda54f084633b7d5a38ec068e84fcad000d582ec5eaafR77

kawamuray · 2024-02-19T02:19:40Z

Apart from comments, AveragingRateLimiter still uses synchronized. We should get rid of it too.

That's right. I knew it but left it as-is with intention such that it would have almost no risk to conflict with vthread.
I think replacing all usage of synchronized with locks has the pros of that it eliminates risk of conflicting with vthread, while it also has cons as basically more lines of code to do the same thing.
So I thought for methods with reasonably small, closed and has no blocking work inside, rarely it is better to leave synchronized as-is.
Different thoughts?

ocadaruma · 2024-02-19T03:24:03Z

So I thought for methods with reasonably small, closed and has no blocking work inside, rarely it is better to leave synchronized as-is.

Thank you for the explanation. Fair enough. Sounds good to keep synchronized then

kawamuray · 2024-02-19T05:38:08Z

Applied all feedback. PTAL

ocadaruma

Left only minor comment

ocadaruma · 2024-02-23T03:50:39Z

processor/src/main/java/com/linecorp/decaton/processor/runtime/internal/Utils.java

+        }
+        ScheduledFuture<?> cancelFut = scheduledExecutor.schedule(() -> {
+            if (!cf.isDone()) {
+                cf.complete(value);


Since this completion handler may include destroyThreadProcessor call which is potentially costly, shouldn't we have more pool size than 1 ? (like available processor num ?)

IIUC, I think the first argument is "corePoolSize", so my understanding of the behavior is, when there's a running scheduled task at the time of next scheduled execution occurs, the executor creates a new thread for processing it (and it's count is unbound) so a scheduled task execution never blocks the others.

hm, I think your concern is correct. fixing.

ocadaruma

LGTM. Magnificent!

kawamuray added 16 commits January 17, 2024 18:12

(tmp)

ce2b6c3

(tmp)

ea67465

tmp

bf9d68d

ReentrantLock version

44d1cff

Use gradle 8.5 and java 21

c86c1f9

Eliminate synchronized in ProcessingContextImpl

111c70d

Revert "ReentrantLock version"

31ad207

This reverts commit b03e678478cf4600add4f941519a308200bb7e64.

Introduce SubPartitionRuntime to enable switching pthread and vthread

fa4dae6

Support SubPartitionRuntime in benchmark mode

d42b7e4

(cleanup) no longer used jvm flags

77dd57d

Eliminate use of synchronized

48af2b2

more eliminate use of synchronized

fc8fdd9

Add it for vthread mode

57171a0

Revert source/target compatibility to java8

e3cf5f7

no longer need enable-preview

d70a15c

garbage

11de122

kawamuray requested a review from ocadaruma January 17, 2024 11:21

kawamuray added 2 commits January 18, 2024 16:59

Follow up work

69cf86b

Use just Java 21 or higher to build project

573e79e

kawamuray added 4 commits January 23, 2024 14:44

Handle TODOs

f19a258

Add integration test for Vthread runtime

8e56d30

Add support to split I/O simulation latency into many in benchmark

7c29d58

More understandable API

409d8af

ocadaruma reviewed Feb 2, 2024

View reviewed changes

kawamuray added 3 commits February 2, 2024 15:26

Try loading async-profiler before warmup so that it won't cause deopt…

306eb68

…imization later

Preload AP but record only the actual execution part

b2cb330

(remove) disable metrics once

0b66855

ocadaruma mentioned this pull request Feb 6, 2024

Add JDK21 to CI build target #213

Closed

kawamuray added 6 commits February 8, 2024 14:20

ThreadPoolSubpartitions should take care of thread_pool runtime speci…

cbf9c8a

…fic metrics

Run benchmark with fixed heapsize

bfe7a38

Avoid stream API for better performance

42199d5

Benchmark use PROVIDED scope processor

2fb8b21

(fixup) reduce cost of cleanup

7a2df2d

Explicitly set -server option for benchmark

d58b4e9

kawamuray marked this pull request as ready for review February 9, 2024 08:53

Use ConcurrentHashMap.newKeySet

c38afd8

kawamuray requested a review from ocadaruma February 10, 2024 01:55

ocadaruma reviewed Feb 17, 2024

View reviewed changes

(fixup) apply feedback

bf1d293

kawamuray requested a review from ocadaruma February 19, 2024 05:38

(fixup) nit

fc55663

ocadaruma reviewed Feb 23, 2024

View reviewed changes

kawamuray requested a review from ocadaruma February 29, 2024 04:39

(fixiup) apply feedback for scheduled executor

1ccbae9

ocadaruma approved these changes Mar 1, 2024

View reviewed changes

ocadaruma merged commit 83be0df into line:master Mar 6, 2024
2 checks passed

ocadaruma mentioned this pull request Mar 7, 2024

Ensure decaton works with older jdk #225

Merged

kawamuray added new feature Add a new feature breaking change Breaking change for a public API labels Mar 7, 2024

kawamuray mentioned this pull request Mar 26, 2024

Some fixes for benchmark #230

Merged

tadashiya mentioned this pull request May 16, 2024

Use TopicPartition.topic for metrics #235

Merged

ocadaruma mentioned this pull request May 20, 2024

Virtual threads support #209

Closed

ocadaruma mentioned this pull request Dec 4, 2024

Considering making the Completion interface extend AutoCloseable #247

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add virtual threads support #224

Add virtual threads support #224

kawamuray commented Jan 17, 2024 •

edited

Loading

ocadaruma commented Jan 20, 2024

kawamuray commented Jan 22, 2024

ocadaruma left a comment

ocadaruma Jan 23, 2024

kawamuray Feb 9, 2024

ocadaruma left a comment

ocadaruma Feb 16, 2024

kawamuray Feb 19, 2024

ocadaruma Feb 17, 2024

ocadaruma Feb 17, 2024

ocadaruma Feb 17, 2024

kawamuray Feb 19, 2024

ocadaruma Feb 17, 2024

kawamuray Feb 19, 2024

ocadaruma Feb 17, 2024

kawamuray Feb 19, 2024

kawamuray commented Feb 19, 2024

ocadaruma commented Feb 19, 2024

kawamuray commented Feb 19, 2024

ocadaruma left a comment

ocadaruma Feb 23, 2024

kawamuray Feb 29, 2024

kawamuray Feb 29, 2024

ocadaruma left a comment

Add virtual threads support #224

Add virtual threads support #224

Conversation

kawamuray commented Jan 17, 2024 • edited Loading

ocadaruma commented Jan 20, 2024

kawamuray commented Jan 22, 2024

ocadaruma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ocadaruma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kawamuray commented Feb 19, 2024

ocadaruma commented Feb 19, 2024

kawamuray commented Feb 19, 2024

ocadaruma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ocadaruma left a comment

Choose a reason for hiding this comment

kawamuray commented Jan 17, 2024 •

edited

Loading