Optimize observability and debugging experience #3901

Stephan202 · 2024-10-03T10:34:52Z

Performance is improved in two ways:

By invoking Scannable#stepName() only when Attr.NAME is not
explicitly set.
By optimizing Traces#extractOperatorAssemblyInformationParts, to
which several Scannable#stepName() implementations delegate.

The Scannable#name() logic may be executed many times, e.g. if a hot
code path uses {Mono,Flux}#log or Micrometer instrumentation. The
added benchmark shows that for large stack traces, the new Traces
implementation is several orders of magnitude more efficient in terms of
compute and memory resource utilization.

Deferral of invocation of Scannable#stepName() assumes that said
method does not have side-effects. This is true for all built-in
implementations.

While there:

Improve two existing benchmarks by utilizing the black hole to which
benchmark method return values are implicitly sent.
Unify reactor.core.publisher package matching to prefix matching.

Stephan202

Added some comments with context. This PR relates to #3900.

benchmarks/src/main/java/reactor/core/publisher/MonoAllBenchmark.java

benchmarks/src/main/java/reactor/core/publisher/TracesBenchmark.java

Stephan202 · 2024-10-03T10:44:11Z

reactor-core/src/main/java/reactor/core/publisher/Traces.java

 				|| stackTraceRow.startsWith("reactor.core.publisher.Mono.onAssembly")
-				|| stackTraceRow.equals("reactor.core.publisher.Mono.onAssembly")
 				|| stackTraceRow.equals("reactor.core.publisher.Flux.onAssembly")


This line removal is unrelated to the rest of the PR. The dropped line is redundant, as it is preceded by a predicate that matches in strictly more contexts. Question: should the Flux.onAssembly line (and some others below) be updated to also use startsWith?

I think they can all use startsWith instead of equals.

Alright, will do (and update the PR description to match this additional change).

reactor-core/src/main/java/reactor/core/publisher/Traces.java

Stephan202 · 2024-10-03T10:50:46Z

reactor-core/src/main/java/reactor/core/publisher/Traces.java

 	static String[] extractOperatorAssemblyInformationParts(String source) {
-		String[] uncleanTraces = source.split("\n");
-		final List<String> traces = Stream.of(uncleanTraces)
-		                                  .map(String::trim)
-		                                  .filter(s -> !s.isEmpty())
-		                                  .collect(Collectors.toList());
+		Iterator<String> traces = trimmedNonemptyLines(source);


The largest improvements in this PR come from these changes.

Key contributors to performance of the old implementation:

String#split accepts a regular expression. We're no longer performing the comparatively expensive operation of compiling regular expressions.

String#split allocates an array and substrings proportional to the provided input, covering a potentially large part of the input that does not at all influence the result of this method.

The Stream operation likewise processes irrelevant lines, and allocates a potentially large list.

The new implementation instead lazily iterates over the input, processing only relevant lines, and tracking only the two most-recently-seen lines.

Stephan202 · 2024-10-03T10:52:26Z

reactor-core/src/main/java/reactor/core/publisher/Traces.java

+		if (isUserCode(currentLine)) {
+			// No line is a Reactor API line.
+			return new String[]{currentLine};
 		}


Some logic was moved around, but existing comments were relocated with it. This should aid review. It's nice that there was already good test coverage.

reactor-core/src/main/java/reactor/core/publisher/Traces.java

Stephan202 · 2024-10-03T10:56:50Z

reactor-core/src/main/java/reactor/core/publisher/Traces.java

+	/**
+	 * Returns an iterator over all trimmed non-empty lines in the given source string.
+	 *
+	 * @implNote This implementation attempts to minimize allocations.
+	 */
+	private static Iterator<String> trimmedNonemptyLines(String source) {
+		return new Iterator<String>() {


This manually-crafted iterator feels a bit like "coding like it's 1999", but I didn't find a less-verbose alternative that doesn't impact over-all code readability. (Had Guava been on the classpath, then I'd have opted to extend AbstractIterator.) Open to suggestions!

chemicL · 2024-10-04T08:30:35Z

Thanks for the PR. I will have a closer look. However, bear in mind this change won't make it into 3.7.0-RC1 but would target directly 3.7.0. I think that's ok as it's not an API change. Together with #3900 it is a behaviour change, but unless there were side effects in code they would also not be observed aside from the performance gains. I think it's worth noting some sort of warning for the stepName lazy evaluation when we do release notes.

Stephan202 · 2024-10-04T17:04:05Z

Thanks for the feedback @chemicL! I cherry-picked the commit from #3900 into this branch and rewrote the PR title and summary.

Stephan202 · 2024-10-04T21:39:51Z

@chemicL I just realized that the impact of this PR is likely over-stated, as in practice the input stacktrace appears to always be generated by a CallSiteSupplierFactory implementation, both of which output at most two lines. (In my defense: the unit tests of the modified code also seem to indicate that more lines may be expected.)

So perhaps we can optimize the code further by skipping the intermediate single-string representation. I might have time for a closer look into that this weekend.

Stephan202 · 2024-10-07T05:27:36Z

as in practice the input stacktrace appears to always be generated by a CallSiteSupplierFactory implementation, both of which output at most two lines.

There's one other code path: CallSiteInfoAddingMethodVisitor passes a manually constructed two-line stack trace to Hooks#addCallSiteInfo.

So perhaps we can optimize the code further by skipping the intermediate single-string representation. I might have time for a closer look into that this weekend.

I now have a POC for this locally. Against a (ReactorDebugAgent-using) benchmark of local code it isn't yet faster that the current PR (surprisingly; requires more investigation), but for the cleanest and likely fastest code, it'd be better if we can have CallSiteInfoAddingMethodVisitor pass the two constructed stack frames separately. Doing this requires adding (or modifying) a public Hooks method, which causes :reactor-core:japicmp to report an API compatibility failure. Is that acceptable, and if so, how can I make that change without failing the build?

chemicL · 2024-11-21T14:47:09Z

Hey, @Stephan202. I've been a bit busy lately but would like to revisit your PRs. Can you give an update on the above considerations? How impactful is this change or an alternative change from your PoC? I assume we would be able to alter the Hooks methods that are marked as deprecated with a huge warning they're for internal use. You only need to find japicmp configuration in build.gradle and add an entry to the methodExcludes = [] array.

Stephan202 · 2024-11-21T16:54:21Z

Thanks for the ping on this PR @chemicL; I meant to report back here. :shame:

I did try the alternative approach mentioned (including the Hooks customization), but testing it against an internal benchmark (one that contains some Picnic-specific code, but mostly causes Reactor logic to be executed, with Reactor Debug Agent enabled), I consistently found the alternative approach be perform slightly worse. I lost quite some time over that, as it really defied (and still defies) my intuitions.

I can look into polishing that code and pushing it to an alternative branch for a second opinion; will try to find some time. That said, based on the above, my tentative suggestion would be to proceed with this PR as-is. (Except perhaps for trimming the JMH benchmark inputs, because as mentioned, in practice the code will generally parse at most two stack frames, rather than 1000.) If the alternative approach can be made more performant after all, that can be tackled in a follow-up PR.

Stephan202 · 2024-11-21T17:52:27Z

Rebased branch; applied 100% cleanly.

Stephan202 · 2024-11-21T18:13:21Z

The experiments I tried are on this messy branch. If desired I can clean it up, though it's a bit TBD when I'll have time to dive back into this topic.

chemicL · 2024-11-25T16:23:48Z

@Stephan202 I am just trying to understand whether the change is actually needed. As I understand you discovered that this will only be triggered for processing two stack frames (can you point to where it's limited to only 2?), therefore the benchmark is not relevant to the expected usage of this API, yes? And also, there is the risk of touching and changing a stable code base for not much benefit and potential regressions. Is there a possibility that these optimizations will have an actual effect on real world applications?

Stephan202 · 2024-11-25T17:56:35Z

@chemicL fair question! Based on earlier testing the answer is "yes, this is an improvement", but let me get back to you in the coming days with some more hard data. (Exact timing TBD.)

This logic may be executed many times, e.g. if a hot code path uses `{Mono,Flux}#log` or Micrometer instrumentation. The added benchmark shows that for large stack traces the new implementation is several orders of magnitude more efficient in terms of compute and memory resource utilization. While there, improve two existing benchmarks by utilizing the black hole to which benchmark method return values are implicitly sent.

(cherry picked from commit 009ec89)

Stephan202 · 2024-11-27T21:58:56Z

@chemicL alright, I rebased the branch on main and added a small commit to make the benchmark more realistic. In the remainder of this post I'm comparing this branch to the current HEAD of main (7cc701c), such that improvements of #3902 apply in each case / don't bias the result.

For the TracesBenchmark in this PR, I locally get the following results:

Benchmark before the changes

Benchmark                                             (reactorLeadingLines)  (trailingLines)  Mode  Cnt     Score     Error   Units
TracesBenchmark.measureThroughput                                         0                0  avgt    5    93.857 ±   2.071   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                0  avgt    5  8859.998 ± 195.529  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                0  avgt    5   872.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                0  avgt    5   121.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                0  avgt    5   155.000                ms
TracesBenchmark.measureThroughput                                         0                1  avgt    5   122.244 ±   3.167   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                1  avgt    5  8050.975 ± 209.703  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                1  avgt    5  1032.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                1  avgt    5   109.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                1  avgt    5   150.000                ms
TracesBenchmark.measureThroughput                                         0                2  avgt    5   143.698 ±   1.597   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                2  avgt    5  8175.946 ±  91.208  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                2  avgt    5  1232.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                2  avgt    5   111.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                2  avgt    5   169.000                ms
TracesBenchmark.measureThroughput                                         1                0  avgt    5   121.232 ±   2.857   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                0  avgt    5  8117.946 ± 190.510  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                0  avgt    5  1032.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                0  avgt    5   110.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                0  avgt    5   162.000                ms
TracesBenchmark.measureThroughput                                         1                1  avgt    5   155.369 ±   3.385   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                1  avgt    5  7561.889 ± 165.250  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                1  avgt    5  1232.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                1  avgt    5   103.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                1  avgt    5   160.000                ms
TracesBenchmark.measureThroughput                                         1                2  avgt    5   166.787 ±   3.910   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                2  avgt    5  8142.041 ± 191.441  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                2  avgt    5  1424.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                2  avgt    5   111.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                2  avgt    5   166.000                ms
TracesBenchmark.measureThroughput                                         2                0  avgt    5   144.471 ±   2.355   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                0  avgt    5  8237.969 ± 134.557  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                0  avgt    5  1248.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                0  avgt    5   112.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                0  avgt    5   164.000                ms
TracesBenchmark.measureThroughput                                         2                1  avgt    5   166.155 ±   1.974   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                1  avgt    5  8264.689 ±  98.186  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                1  avgt    5  1440.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                1  avgt    5   112.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                1  avgt    5   147.000                ms
TracesBenchmark.measureThroughput                                         2                2  avgt    5   197.339 ±   3.674   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                2  avgt    5  7925.318 ± 147.817  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                2  avgt    5  1640.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                2  avgt    5   108.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                2  avgt    5   146.000                ms

Benchmark after the changes

Benchmark                                             (reactorLeadingLines)  (trailingLines)  Mode  Cnt      Score     Error   Units
TracesBenchmark.measureThroughput                                         0                0  avgt    5     27.853 ±   1.429   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                0  avgt    5  12327.398 ± 629.891  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                0  avgt    5    360.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                0  avgt    5    168.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                0  avgt    5    251.000                ms
TracesBenchmark.measureThroughput                                         0                1  avgt    5     47.004 ±   3.899   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                1  avgt    5  11690.405 ± 971.380  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                1  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                1  avgt    5    159.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                1  avgt    5    216.000                ms
TracesBenchmark.measureThroughput                                         0                2  avgt    5     46.116 ±   3.718   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                2  avgt    5  11915.112 ± 943.275  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                2  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                2  avgt    5    162.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                2  avgt    5    209.000                ms
TracesBenchmark.measureThroughput                                         1                0  avgt    5     56.545 ±   1.064   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                0  avgt    5   9714.213 ± 182.537  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                0  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                0  avgt    5    133.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                0  avgt    5    196.000                ms
TracesBenchmark.measureThroughput                                         1                1  avgt    5     45.934 ±   2.003   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                1  avgt    5  11959.417 ± 520.615  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                1  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                1  avgt    5    163.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                1  avgt    5    221.000                ms
TracesBenchmark.measureThroughput                                         1                2  avgt    5     49.027 ±   3.465   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                2  avgt    5  11206.856 ± 785.266  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                2  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                2  avgt    5    153.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                2  avgt    5    214.000                ms
TracesBenchmark.measureThroughput                                         2                0  avgt    5     47.588 ±   2.095   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                0  avgt    5  10902.505 ± 477.988  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                0  avgt    5    544.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                0  avgt    5    148.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                0  avgt    5    200.000                ms
TracesBenchmark.measureThroughput                                         2                1  avgt    5     47.097 ±   2.317   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                1  avgt    5  11016.262 ± 539.726  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                1  avgt    5    544.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                1  avgt    5    149.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                1  avgt    5    182.000                ms
TracesBenchmark.measureThroughput                                         2                2  avgt    5     45.345 ±   2.225   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                2  avgt    5  11441.877 ± 557.287  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                2  avgt    5    544.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                2  avgt    5    156.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                2  avgt    5    214.000                ms

In short, this means a >3x speedup and more than halving of allocated memory for the most common 2-line case.

I also had another look at a "more representative" Picnic-internal benchmark, where some longer reactive chains are subscribed to. I can't easily share this code, but with the Reactor Debug Agent enabled (as we do in production) there's an 18-22% speedup and ~40% reduction in allocated memory:

Internal benchmark before the changes

Benchmark                                             (orderDepth)  (transformationStepCount)  Mode  Cnt       Score      Error   Units
TransformationBenchmark.transform                                5                         20  avgt    5     150.550 ±    2.812   us/op
TransformationBenchmark.transform:gc.alloc.rate                  5                         20  avgt    5    1830.314 ±   26.565  MB/sec
TransformationBenchmark.transform:gc.alloc.rate.norm             5                         20  avgt    5  288942.054 ± 1734.845    B/op
TransformationBenchmark.transform:gc.count                       5                         20  avgt    5     150.000             counts
TransformationBenchmark.transform:gc.time                        5                         20  avgt    5     184.000                 ms

Internal benchmark after the changes

Benchmark                                             (orderDepth)  (transformationStepCount)  Mode  Cnt       Score    Error   Units
TransformationBenchmark.transform                                5                         20  avgt    5     120.198 ±  0.804   us/op
TransformationBenchmark.transform:gc.alloc.rate                  5                         20  avgt    5    1358.328 ±  9.011  MB/sec
TransformationBenchmark.transform:gc.alloc.rate.norm             5                         20  avgt    5  171202.277 ± 18.866    B/op
TransformationBenchmark.transform:gc.count                       5                         20  avgt    5     111.000           counts
TransformationBenchmark.transform:gc.time                        5                         20  avgt    5     141.000               ms

Our main store application is a modular monolith that makes very heavy use of Reactor, with some key request flows creating very long reactive chains. I'm reasonably confident we'll see a noticeable latency improvement with this change.

chemicL · 2024-11-28T10:26:28Z

Wow, that's really impressive @Stephan202 🥇
Thanks for collecting and sharing the results, I'll get to the review then 🚀

Stephan202

Added a commit. Tnx for the review @chemicL!

reactor-core/src/main/java/reactor/core/publisher/Traces.java

Stephan202 · 2024-11-28T12:12:49Z

reactor-core/src/main/java/reactor/core/publisher/Traces.java

 				|| stackTraceRow.startsWith("reactor.core.publisher.Mono.onAssembly")
-				|| stackTraceRow.equals("reactor.core.publisher.Mono.onAssembly")
 				|| stackTraceRow.equals("reactor.core.publisher.Flux.onAssembly")


Alright, will do (and update the PR description to match this additional change).

Stephan202 · 2024-11-28T12:29:14Z

reactor-core/src/main/java/reactor/core/publisher/Traces.java

+				if (index >= source.length()) {
+					return null;
+				}


Just realized that we can drop this case 👁️

Ah, yes. I didn't finish reviewing yesterday and it seems we noticed the same thing :)

Stephan202 · 2024-11-28T20:10:37Z

The Java 11 tests failed, but ./gradlew :reactor-core:java11Test --no-daemon passes for me locally. Perhaps a flaky test?

chemicL · 2024-11-28T12:00:08Z

reactor-core/src/main/java/reactor/core/publisher/Traces.java

@@ -29,6 +28,7 @@
 * @author Sergei Egorov
 */
 final class Traces {
+	private static final String PUBLISHER_PACKAGE_PREFIX = "reactor.core.publisher.";


Let's move the private constant below the package-private ones.

reactor-core/src/main/java/reactor/core/publisher/Traces.java

chemicL · 2024-11-29T12:05:09Z

Thanks. I have some minor comments and we're good to go. The test failure is a flaky test indeed, unrelated to this change.

I think we can merge this since this is a great improvement. As a side discussion - have you considered using CharBuffer to avoid String allocations at all? Currently if there's more lines in the stack trace they will be parsed even if the previous line is eventually used due to the nature of the iterator. Were we able to avoid allocating another String this would be even faster. If you think there's room for improvements we can explore this further in another PR potentially.

chemicL · 2024-11-29T12:33:56Z

reactor-core/src/main/java/reactor/core/publisher/Traces.java

+			}
+
+			@Nullable
+			private String getNextLine() {


Here's an outline of an idea to avoid using String and get the next line:

// Assume the entire input (String source) is wrapped in CharBuffer CharBuffer cb = CharBuffer.wrap(source); private CharBuffer getNextLine() { int i = 0; while (i < cb.length()) { if (Character.isWhitespace(cb.charAt(i))) continue; int end = i + 1; while (end < cb.length() && cb.charAt(end) != '\n') { end++; } CharBuffer line = cb.subSequence(i, end); i = end + 1; return line; } }

The match for the reactor package name can also be done in the same linear scanning manner. I wonder if it'd be faster.

I had a look at this, and went down the rabbit hole. The TL;DR is that I added three commits that further improve performance:

One that avoids String#join, unrelated to your suggestion here.

One that replaces String#trim, such that the original string's underlying char[] is always reused, thanks to the implementation of String#substring. This is IIUC an alternative to your suggestion to use CharBuffer; I couldn't use the latter, as it lacks operations such as .startsWith and indexOf.

One that introduces a custom Substring class and is "somehow" even more performant.

Benchmark of the code on `main`

Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units TracesBenchmark.measureThroughput 0 0 avgt 5 91.389 ± 1.255 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 8765.340 ± 120.723 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 840.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 119.000 counts TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 179.000 ms TracesBenchmark.measureThroughput 0 1 avgt 5 122.102 ± 3.312 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 8060.272 ± 216.623 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 1032.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 109.000 counts TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 158.000 ms TracesBenchmark.measureThroughput 0 2 avgt 5 148.867 ± 2.750 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 7892.060 ± 145.950 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 1232.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 107.000 counts TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 156.000 ms TracesBenchmark.measureThroughput 1 0 avgt 5 122.145 ± 1.654 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 8057.222 ± 109.248 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 1032.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 109.000 counts TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 165.000 ms TracesBenchmark.measureThroughput 1 1 avgt 5 145.273 ± 0.361 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 8087.325 ± 20.159 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 1232.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 110.000 counts TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 158.000 ms TracesBenchmark.measureThroughput 1 2 avgt 5 177.756 ± 2.771 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 7639.569 ± 119.523 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 1424.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 104.000 counts TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 152.000 ms TracesBenchmark.measureThroughput 2 0 avgt 5 147.853 ± 2.101 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 8049.391 ± 114.613 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 1248.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 109.000 counts TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 151.000 ms TracesBenchmark.measureThroughput 2 1 avgt 5 176.972 ± 2.701 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 7759.568 ± 118.381 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 1440.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 106.000 counts TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 154.000 ms TracesBenchmark.measureThroughput 2 2 avgt 5 199.950 ± 0.792 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 8203.243 ± 32.542 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 1720.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 111.000 counts TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 162.000 ms

Benchmark of the already-reviewed code

Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units TracesBenchmark.measureThroughput 0 0 avgt 5 27.709 ± 1.321 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 12390.947 ± 590.685 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 360.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 168.000 counts TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 226.000 ms TracesBenchmark.measureThroughput 0 1 avgt 5 47.622 ± 1.357 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 11534.798 ± 330.546 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 576.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 157.000 counts TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 217.000 ms TracesBenchmark.measureThroughput 0 2 avgt 5 46.575 ± 2.868 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 11796.026 ± 725.031 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 576.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 160.000 counts TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 222.000 ms TracesBenchmark.measureThroughput 1 0 avgt 5 48.489 ± 3.267 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 11330.727 ± 758.029 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 576.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 154.000 counts TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 208.000 ms TracesBenchmark.measureThroughput 1 1 avgt 5 46.214 ± 2.285 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 11887.215 ± 584.466 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 576.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 162.000 counts TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 212.000 ms TracesBenchmark.measureThroughput 1 2 avgt 5 46.943 ± 2.083 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 11702.329 ± 513.790 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 576.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 159.000 counts TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 230.000 ms TracesBenchmark.measureThroughput 2 0 avgt 5 48.431 ± 1.966 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 10712.378 ± 436.213 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 544.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 146.000 counts TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 210.000 ms TracesBenchmark.measureThroughput 2 1 avgt 5 46.974 ± 3.057 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 11046.458 ± 731.622 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 544.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 151.000 counts TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 210.000 ms TracesBenchmark.measureThroughput 2 2 avgt 5 50.708 ± 0.980 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 10230.724 ± 197.935 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 544.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 139.000 counts TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 187.000 ms

Benchmark after the first improvement

Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units TracesBenchmark.measureThroughput 0 0 avgt 5 21.666 ± 0.360 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 10563.716 ± 175.257 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 240.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 144.000 counts TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 212.000 ms TracesBenchmark.measureThroughput 0 1 avgt 5 36.057 ± 1.438 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 10791.558 ± 429.161 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 408.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 146.000 counts TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 206.000 ms TracesBenchmark.measureThroughput 0 2 avgt 5 35.516 ± 1.274 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 10955.809 ± 391.095 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 408.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 149.000 counts TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 209.000 ms TracesBenchmark.measureThroughput 1 0 avgt 5 37.723 ± 1.266 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 10719.145 ± 360.313 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 424.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 145.000 counts TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 201.000 ms TracesBenchmark.measureThroughput 1 1 avgt 5 35.600 ± 2.316 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 11360.236 ± 738.477 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 424.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 154.000 counts TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 221.000 ms TracesBenchmark.measureThroughput 1 2 avgt 5 36.167 ± 1.605 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 11180.857 ± 492.758 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 424.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 152.000 counts TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 213.000 ms TracesBenchmark.measureThroughput 2 0 avgt 5 34.945 ± 0.914 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 10697.778 ± 280.941 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 392.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 145.000 counts TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 210.000 ms TracesBenchmark.measureThroughput 2 1 avgt 5 35.215 ± 1.076 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 10615.784 ± 325.177 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 392.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 144.000 counts TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 198.000 ms TracesBenchmark.measureThroughput 2 2 avgt 5 35.754 ± 1.788 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 10456.676 ± 524.036 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 392.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 142.000 counts TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 200.000 ms

Benchmark after the second improvement

Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units TracesBenchmark.measureThroughput 0 0 avgt 5 16.695 ± 0.223 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 6854.558 ± 91.285 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 120.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 94.000 counts TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 130.000 ms TracesBenchmark.measureThroughput 0 1 avgt 5 27.538 ± 2.173 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 7482.351 ± 582.492 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 216.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 102.000 counts TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 143.000 ms TracesBenchmark.measureThroughput 0 2 avgt 5 26.589 ± 1.541 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 7748.273 ± 448.220 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 216.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 106.000 counts TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 154.000 ms TracesBenchmark.measureThroughput 1 0 avgt 5 28.482 ± 0.514 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 7500.140 ± 135.789 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 224.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 102.000 counts TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 150.000 ms TracesBenchmark.measureThroughput 1 1 avgt 5 26.180 ± 0.323 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 8159.420 ± 101.176 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 224.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 111.000 counts TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 163.000 ms TracesBenchmark.measureThroughput 1 2 avgt 5 26.194 ± 0.395 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 8155.085 ± 122.896 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 224.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 111.000 counts TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 157.000 ms TracesBenchmark.measureThroughput 2 0 avgt 5 26.485 ± 0.255 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 7489.392 ± 72.106 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 208.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 102.000 counts TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 146.000 ms TracesBenchmark.measureThroughput 2 1 avgt 5 26.530 ± 0.298 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 7476.572 ± 83.959 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 208.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 102.000 counts TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 149.000 ms TracesBenchmark.measureThroughput 2 2 avgt 5 26.710 ± 0.566 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 7426.339 ± 157.301 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 208.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 101.000 counts TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 146.000 ms

Benchmark after the third improvement

Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units TracesBenchmark.measureThroughput 0 0 avgt 5 18.096 ± 0.304 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 7588.502 ± 127.338 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 144.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 103.000 counts TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 145.000 ms TracesBenchmark.measureThroughput 0 1 avgt 5 22.850 ± 0.378 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 5008.182 ± 83.316 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 120.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 68.000 counts TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 98.000 ms TracesBenchmark.measureThroughput 0 2 avgt 5 20.251 ± 0.517 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 5651.105 ± 144.835 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 120.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 77.000 counts TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 114.000 ms TracesBenchmark.measureThroughput 1 0 avgt 5 23.276 ± 0.700 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 4261.171 ± 128.182 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 104.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 58.000 counts TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 79.000 ms TracesBenchmark.measureThroughput 1 1 avgt 5 19.319 ± 0.161 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 5133.773 ± 43.098 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 104.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 70.000 counts TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 96.000 ms TracesBenchmark.measureThroughput 1 2 avgt 5 19.673 ± 0.342 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 5041.454 ± 87.186 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 104.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 68.000 counts TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 89.000 ms TracesBenchmark.measureThroughput 2 0 avgt 5 18.855 ± 0.652 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 5260.401 ± 184.048 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 104.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 72.000 counts TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 99.000 ms TracesBenchmark.measureThroughput 2 1 avgt 5 18.931 ± 0.206 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 5238.827 ± 56.895 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 104.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 71.000 counts TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 103.000 ms TracesBenchmark.measureThroughput 2 2 avgt 5 18.723 ± 0.983 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 5297.753 ± 279.570 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 104.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 72.000 counts TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 100.000 ms

So for the common (1, 1) case, we see the following timing and memory usage differences:

Variant Speed Normalized garbage ============= ============================ ========================== Baseline 145.273 ± 0.361 ns/op 1232.000 ± 0.001 B/op Reviewed code 46.214 ± 2.285 ns/op (-68%) 576.000 ± 0.001 B/op (-53%) Speedup 1 35.600 ± 2.316 ns/op (-23%) 424.000 ± 0.001 B/op (-26%) Speedup 2 26.180 ± 0.323 ns/op (-26%) 224.000 ± 0.001 B/op (-47%) Speedup 3 19.319 ± 0.161 ns/op (-26%) 104.000 ± 0.001 B/op (-53%)

I can see how "Speedup 3" is controversial from a maintainability point of view. I guess the only way to justify it, is by realizing that we're dealing with a very hot codepath here (at least for Reactor Debug Agent users). Up to you :)

Stephan202

Added four commits, as described in the comments :)

reactor-core/src/main/java/reactor/core/publisher/Traces.java

Stephan202 · 2024-11-30T00:01:06Z

reactor-core/src/main/java/reactor/core/publisher/Traces.java

+			}
+
+			@Nullable
+			private String getNextLine() {


I had a look at this, and went down the rabbit hole. The TL;DR is that I added three commits that further improve performance:

One that avoids String#join, unrelated to your suggestion here.

One that replaces String#trim, such that the original string's underlying char[] is always reused, thanks to the implementation of String#substring. This is IIUC an alternative to your suggestion to use CharBuffer; I couldn't use the latter, as it lacks operations such as .startsWith and indexOf.

One that introduces a custom Substring class and is "somehow" even more performant.

Benchmark of the code on `main`

Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units TracesBenchmark.measureThroughput 0 0 avgt 5 91.389 ± 1.255 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 8765.340 ± 120.723 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 840.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 119.000 counts TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 179.000 ms TracesBenchmark.measureThroughput 0 1 avgt 5 122.102 ± 3.312 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 8060.272 ± 216.623 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 1032.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 109.000 counts TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 158.000 ms TracesBenchmark.measureThroughput 0 2 avgt 5 148.867 ± 2.750 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 7892.060 ± 145.950 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 1232.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 107.000 counts TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 156.000 ms TracesBenchmark.measureThroughput 1 0 avgt 5 122.145 ± 1.654 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 8057.222 ± 109.248 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 1032.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 109.000 counts TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 165.000 ms TracesBenchmark.measureThroughput 1 1 avgt 5 145.273 ± 0.361 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 8087.325 ± 20.159 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 1232.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 110.000 counts TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 158.000 ms TracesBenchmark.measureThroughput 1 2 avgt 5 177.756 ± 2.771 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 7639.569 ± 119.523 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 1424.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 104.000 counts TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 152.000 ms TracesBenchmark.measureThroughput 2 0 avgt 5 147.853 ± 2.101 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 8049.391 ± 114.613 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 1248.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 109.000 counts TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 151.000 ms TracesBenchmark.measureThroughput 2 1 avgt 5 176.972 ± 2.701 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 7759.568 ± 118.381 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 1440.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 106.000 counts TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 154.000 ms TracesBenchmark.measureThroughput 2 2 avgt 5 199.950 ± 0.792 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 8203.243 ± 32.542 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 1720.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 111.000 counts TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 162.000 ms

Benchmark of the already-reviewed code

Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units TracesBenchmark.measureThroughput 0 0 avgt 5 27.709 ± 1.321 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 12390.947 ± 590.685 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 360.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 168.000 counts TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 226.000 ms TracesBenchmark.measureThroughput 0 1 avgt 5 47.622 ± 1.357 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 11534.798 ± 330.546 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 576.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 157.000 counts TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 217.000 ms TracesBenchmark.measureThroughput 0 2 avgt 5 46.575 ± 2.868 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 11796.026 ± 725.031 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 576.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 160.000 counts TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 222.000 ms TracesBenchmark.measureThroughput 1 0 avgt 5 48.489 ± 3.267 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 11330.727 ± 758.029 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 576.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 154.000 counts TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 208.000 ms TracesBenchmark.measureThroughput 1 1 avgt 5 46.214 ± 2.285 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 11887.215 ± 584.466 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 576.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 162.000 counts TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 212.000 ms TracesBenchmark.measureThroughput 1 2 avgt 5 46.943 ± 2.083 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 11702.329 ± 513.790 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 576.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 159.000 counts TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 230.000 ms TracesBenchmark.measureThroughput 2 0 avgt 5 48.431 ± 1.966 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 10712.378 ± 436.213 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 544.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 146.000 counts TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 210.000 ms TracesBenchmark.measureThroughput 2 1 avgt 5 46.974 ± 3.057 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 11046.458 ± 731.622 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 544.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 151.000 counts TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 210.000 ms TracesBenchmark.measureThroughput 2 2 avgt 5 50.708 ± 0.980 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 10230.724 ± 197.935 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 544.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 139.000 counts TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 187.000 ms

Benchmark after the first improvement

Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units TracesBenchmark.measureThroughput 0 0 avgt 5 21.666 ± 0.360 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 10563.716 ± 175.257 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 240.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 144.000 counts TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 212.000 ms TracesBenchmark.measureThroughput 0 1 avgt 5 36.057 ± 1.438 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 10791.558 ± 429.161 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 408.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 146.000 counts TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 206.000 ms TracesBenchmark.measureThroughput 0 2 avgt 5 35.516 ± 1.274 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 10955.809 ± 391.095 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 408.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 149.000 counts TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 209.000 ms TracesBenchmark.measureThroughput 1 0 avgt 5 37.723 ± 1.266 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 10719.145 ± 360.313 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 424.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 145.000 counts TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 201.000 ms TracesBenchmark.measureThroughput 1 1 avgt 5 35.600 ± 2.316 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 11360.236 ± 738.477 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 424.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 154.000 counts TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 221.000 ms TracesBenchmark.measureThroughput 1 2 avgt 5 36.167 ± 1.605 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 11180.857 ± 492.758 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 424.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 152.000 counts TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 213.000 ms TracesBenchmark.measureThroughput 2 0 avgt 5 34.945 ± 0.914 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 10697.778 ± 280.941 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 392.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 145.000 counts TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 210.000 ms TracesBenchmark.measureThroughput 2 1 avgt 5 35.215 ± 1.076 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 10615.784 ± 325.177 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 392.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 144.000 counts TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 198.000 ms TracesBenchmark.measureThroughput 2 2 avgt 5 35.754 ± 1.788 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 10456.676 ± 524.036 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 392.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 142.000 counts TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 200.000 ms

Benchmark after the second improvement

Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units TracesBenchmark.measureThroughput 0 0 avgt 5 16.695 ± 0.223 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 6854.558 ± 91.285 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 120.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 94.000 counts TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 130.000 ms TracesBenchmark.measureThroughput 0 1 avgt 5 27.538 ± 2.173 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 7482.351 ± 582.492 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 216.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 102.000 counts TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 143.000 ms TracesBenchmark.measureThroughput 0 2 avgt 5 26.589 ± 1.541 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 7748.273 ± 448.220 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 216.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 106.000 counts TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 154.000 ms TracesBenchmark.measureThroughput 1 0 avgt 5 28.482 ± 0.514 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 7500.140 ± 135.789 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 224.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 102.000 counts TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 150.000 ms TracesBenchmark.measureThroughput 1 1 avgt 5 26.180 ± 0.323 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 8159.420 ± 101.176 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 224.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 111.000 counts TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 163.000 ms TracesBenchmark.measureThroughput 1 2 avgt 5 26.194 ± 0.395 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 8155.085 ± 122.896 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 224.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 111.000 counts TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 157.000 ms TracesBenchmark.measureThroughput 2 0 avgt 5 26.485 ± 0.255 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 7489.392 ± 72.106 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 208.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 102.000 counts TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 146.000 ms TracesBenchmark.measureThroughput 2 1 avgt 5 26.530 ± 0.298 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 7476.572 ± 83.959 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 208.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 102.000 counts TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 149.000 ms TracesBenchmark.measureThroughput 2 2 avgt 5 26.710 ± 0.566 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 7426.339 ± 157.301 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 208.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 101.000 counts TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 146.000 ms

Benchmark after the third improvement

Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units TracesBenchmark.measureThroughput 0 0 avgt 5 18.096 ± 0.304 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 7588.502 ± 127.338 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 144.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 103.000 counts TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 145.000 ms TracesBenchmark.measureThroughput 0 1 avgt 5 22.850 ± 0.378 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 5008.182 ± 83.316 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 120.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 68.000 counts TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 98.000 ms TracesBenchmark.measureThroughput 0 2 avgt 5 20.251 ± 0.517 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 5651.105 ± 144.835 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 120.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 77.000 counts TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 114.000 ms TracesBenchmark.measureThroughput 1 0 avgt 5 23.276 ± 0.700 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 4261.171 ± 128.182 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 104.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 58.000 counts TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 79.000 ms TracesBenchmark.measureThroughput 1 1 avgt 5 19.319 ± 0.161 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 5133.773 ± 43.098 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 104.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 70.000 counts TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 96.000 ms TracesBenchmark.measureThroughput 1 2 avgt 5 19.673 ± 0.342 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 5041.454 ± 87.186 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 104.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 68.000 counts TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 89.000 ms TracesBenchmark.measureThroughput 2 0 avgt 5 18.855 ± 0.652 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 5260.401 ± 184.048 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 104.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 72.000 counts TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 99.000 ms TracesBenchmark.measureThroughput 2 1 avgt 5 18.931 ± 0.206 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 5238.827 ± 56.895 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 104.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 71.000 counts TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 103.000 ms TracesBenchmark.measureThroughput 2 2 avgt 5 18.723 ± 0.983 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 5297.753 ± 279.570 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 104.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 72.000 counts TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 100.000 ms

So for the common (1, 1) case, we see the following timing and memory usage differences:

Variant Speed Normalized garbage ============= ============================ ========================== Baseline 145.273 ± 0.361 ns/op 1232.000 ± 0.001 B/op Reviewed code 46.214 ± 2.285 ns/op (-68%) 576.000 ± 0.001 B/op (-53%) Speedup 1 35.600 ± 2.316 ns/op (-23%) 424.000 ± 0.001 B/op (-26%) Speedup 2 26.180 ± 0.323 ns/op (-26%) 224.000 ± 0.001 B/op (-47%) Speedup 3 19.319 ± 0.161 ns/op (-26%) 104.000 ± 0.001 B/op (-53%)

I can see how "Speedup 3" is controversial from a maintainability point of view. I guess the only way to justify it, is by realizing that we're dealing with a very hot codepath here (at least for Reactor Debug Agent users). Up to you :)

Stephan202 · 2024-11-30T00:14:48Z

NB: One further speed-up could be to avoid the string trimming altogether: IIUC the whitespace is introduced only in these places:

That last one is part of the Reactor Debug Agent, shipped as a separate JAR. So the question is whether we can stop trimming before the next major release. (Unless users are required to use reactor-core and reactor-tools versions that match down to the patch level, but given that the Java agent may be configured outside of the application in which reactor-core is bundled, that would seem like a rather strict requirement.) But we could already stop creating the whitespace.

Trimming of trailing whitespace can already be dropped (it's never introduced), though that would require updating the unit tests.

Happy to do in this or another PR; just let me know.

chemicL · 2024-12-02T08:24:13Z

reactor-core/src/main/java/reactor/core/publisher/Traces.java

+		}
+
+		boolean startsWith(String prefix) {
+			return str.startsWith(prefix, start);


For our specific use case it probably won't be a problem but in general this is incomplete as the end index is not considered. Perhaps a simple check for start + prefix.length() < end is also required to make it correct?
With that, I'd argue a bunch of unit tests for this inner class would be helpful. It can be made package private and we can test it in TracesTest.java.

chemicL · 2024-12-02T08:30:31Z

reactor-core/src/main/java/reactor/core/publisher/Traces.java

+	}
+
+	// XXX: Explain.
+	private static final class Substring {


Consider renaming it to StringView. This would communicate to the reader that we are only wrapping the underlying String and not making any copies of it.

Or even StackLineView, since it exposes methods specific to the lines in the stack trace.

chemicL · 2024-12-02T08:31:43Z

reactor-core/src/main/java/reactor/core/publisher/Traces.java

+
+	// XXX: Explain.
+	private static final class Substring {
+		private final String str;


Consider renaming str to underlying, actual, or backingString to indicate its nature.

chemicL · 2024-12-02T08:41:51Z

@Stephan202 the numbers you show look excellent. I think it's worth integrating the current proposal, since in the (1, 1) case you get almost ~7.5x speedup and 12x less memory pressure!

Regarding avoiding the trimming - I think we can stop here. With the above result, changing the behaviour is not justified I believe. I suspect the tabs are useful in some form when printing the stack trace and are only trimmed when finding the "trace-back", right? Anyways, if you want to spend more time on it and show we're not breaking anything, we can discuss that in another issue/PR and aim for closing this one so we can release it soon :)

Please add a unit test, potentially apply some refactoring suggestions and add a comment in place of // XXX that you left and we can ship this. I'm excited to hear about your production savings with this change :)

chemicL

I'll merge and follow up applying the most recent feedback. I'd like for this optimization to be part of the upcoming release.
Thanks a lot @Stephan202, it has been a true pleasure to work with your contributions. Please do come back with more ideas!

Stephan202 · 2024-12-04T13:08:35Z

Thanks for the thoughtful code review and thanks for filing #3949! I too am curious to see how this fares in production. If I have details to share, I'll report back here :)

(As for avoiding the trimming: if I have spare cycles in the future I don't mind having a closer look at that; TBD!)

Follow-up to #3901: * Renamed `Substring` to `StackLineView` * Implemented tests for `StackLineView` * Corrected `contains` and `startsWith` implementations

chemicL · 2024-12-18T22:16:12Z

@Stephan202 I was curious if you had a chance to deploy the newest version?

Stephan202 · 2024-12-21T11:29:26Z

Hey @chemicL! Sorry for not following up. A colleague did test this change in isolation in production, but unfortunately no clear impact was measured. TBH, slightly surprising and disappointing. We're using Datadog in production, but due to Reactor's deep call stacks, DD's JFR-based profiling functionality can't correlate CPU and memory usage with some of our hottest reactive code. This makes it hard to do a more fine-grained before- and after comparison. (We currently run with -XX:FlightRecorderOptions=stackdepth=512, but last I tested this, even the maximum value of -XX:FlightRecorderOptions=stackdepth=2048 didn't change this.)

One caveat is that I didn't find time to do a deeper analysis myself, and now with Christmas coming up there's no appetite to run more experiments (e.g. by doing a temporary downgrade) in production 😬.

Stephan202 requested a review from a team as a code owner October 3, 2024 10:34

Stephan202 commented Oct 3, 2024

View reviewed changes

This was referenced Oct 3, 2024

Defer Scannable#name() fallback logic #3900

Closed

Skip redundant tag deduplication #3902

Merged

chemicL added the area/performance This belongs to the performance theme label Oct 4, 2024

Stephan202 changed the title ~~Optimize Traces#extractOperatorAssemblyInformationParts~~ Optimize Scannable#name() and related logic Oct 4, 2024

Stephan202 force-pushed the sschroevers/traces-performance-improvement branch from f7ebe1c to f470638 Compare November 21, 2024 17:52

Stephan202 added 3 commits November 26, 2024 22:23

Defer Scannable#name() fallback logic

a67e27c

(cherry picked from commit 009ec89)

Update benchmark

c3e519a

Stephan202 force-pushed the sschroevers/traces-performance-improvement branch from f470638 to c3e519a Compare November 27, 2024 21:49

Address PR feedback

6f7d079

Stephan202 commented Nov 28, 2024

View reviewed changes

chemicL requested changes Nov 29, 2024

View reviewed changes

chemicL reviewed Nov 29, 2024

View reviewed changes

Address PR feedback

7c8ce7b

Stephan202 added 3 commits November 30, 2024 00:26

Avoid String#join

02fa17e

Avoid copy on trim

36b3845

Custom substring implementation

9cef75f

Stephan202 commented Nov 30, 2024

View reviewed changes

chemicL reviewed Dec 2, 2024

View reviewed changes

This was referenced Dec 2, 2024

Improve scaling behavior of Mono<Void> and(Publisher<?> other) #3920

Open

[docs] Add warning about static exceptions #3944

Merged

Avoid using static exceptions for better debugging experience reactor/reactor-netty#3529

Merged

chemicL approved these changes Dec 4, 2024

View reviewed changes

chemicL merged commit 0a87988 into reactor:main Dec 4, 2024
7 checks passed

chemicL added a commit that referenced this pull request Dec 4, 2024

Polish #3901

501101b

chemicL mentioned this pull request Dec 4, 2024

Refactor and add tests for optimized assembly traceback #3949

Merged

chemicL added the type/enhancement A general enhancement label Dec 4, 2024

chemicL added this to the 3.7.1 milestone Dec 4, 2024

chemicL added the area/observability label Dec 4, 2024

Stephan202 deleted the sschroevers/traces-performance-improvement branch December 4, 2024 12:59

chemicL changed the title ~~Optimize Scannable#name() and related logic~~ Optimize observability and debugging experience Dec 4, 2024

Optimize observability and debugging experience #3901

Optimize observability and debugging experience #3901

Conversation

Stephan202 commented Oct 3, 2024 • edited Loading

Stephan202 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chemicL commented Oct 4, 2024

Stephan202 commented Oct 4, 2024

Stephan202 commented Oct 4, 2024

Stephan202 commented Oct 7, 2024 • edited Loading

chemicL commented Nov 21, 2024

Stephan202 commented Nov 21, 2024

Stephan202 commented Nov 21, 2024

Stephan202 commented Nov 21, 2024

chemicL commented Nov 25, 2024 • edited Loading

Stephan202 commented Nov 25, 2024

Stephan202 commented Nov 27, 2024

chemicL commented Nov 28, 2024

Stephan202 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stephan202 commented Nov 28, 2024

Choose a reason for hiding this comment

chemicL commented Nov 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stephan202 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stephan202 commented Nov 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chemicL commented Dec 2, 2024 • edited Loading

chemicL left a comment

Choose a reason for hiding this comment

Stephan202 commented Dec 4, 2024

chemicL commented Dec 18, 2024

Stephan202 commented Dec 21, 2024

Stephan202 commented Oct 3, 2024 •

edited

Loading

Stephan202 commented Oct 7, 2024 •

edited

Loading

chemicL commented Nov 25, 2024 •

edited

Loading

chemicL commented Dec 2, 2024 •

edited

Loading