Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api] Optimise TraceContextPropagator.Extract #5749

Merged

Conversation

stevejgordon
Copy link
Contributor

@stevejgordon stevejgordon commented Jul 15, 2024

Contributes to #728

Changes

This PR removes almost all overhead of extracting the tracestate inside the TraceContextPropagator by switching to a stack-allocated or rented buffer when building the final string. Specifically, this avoids using StringBuilder and HashSet to prevent heap allocations. The execution time is also slightly improved with this approach. I've preserved the existing behaviour based on the tests.

This also adds a benchmark to measure the performance of the extract method.

| Method     | LongListMember | MembersCount | Mean        | Error     | StdDev    | Ratio | RatioSD | Gen0    | Gen1   | Allocated | Alloc Ratio |
|----------- |--------------- |------------- |------------:|----------:|----------:|------:|--------:|--------:|-------:|----------:|------------:|
| Extract    | False          | 0            |    128.2 ns |   2.02 ns |   1.79 ns |  1.00 |    0.00 |  0.0312 |      - |     392 B |        1.00 |
| ExtractNew | False          | 0            |    122.5 ns |   2.20 ns |   2.62 ns |  0.96 |    0.02 |  0.0176 |      - |     224 B |        0.57 |
|            |                |              |             |           |           |       |         |         |        |           |             |
| Extract    | False          | 4            |    551.5 ns |  10.68 ns |  14.61 ns |  1.00 |    0.00 |  0.2108 | 0.0010 |    2656 B |        1.00 |
| ExtractNew | False          | 4            |    309.9 ns |   3.43 ns |   2.86 ns |  0.56 |    0.02 |  0.0463 |      - |     584 B |        0.22 |
|            |                |              |             |           |           |       |         |         |        |           |             |
| Extract    | False          | 32           |  3,419.8 ns | 145.07 ns | 420.86 ns |  1.00 |    0.00 |  1.2512 | 0.0381 |   15704 B |        1.00 |
| ExtractNew | False          | 32           |  2,841.3 ns |  54.61 ns |  76.56 ns |  0.83 |    0.05 |  0.2327 |      - |    2936 B |        0.19 |
|            |                |              |             |           |           |       |         |         |        |           |             |
| Extract    | True           | 0            |    129.5 ns |   2.47 ns |   3.29 ns |  1.00 |    0.00 |  0.0312 |      - |     392 B |        1.00 |
| ExtractNew | True           | 0            |    131.5 ns |   2.13 ns |   1.99 ns |  1.02 |    0.03 |  0.0176 |      - |     224 B |        0.57 |
|            |                |              |             |           |           |       |         |         |        |           |             |
| Extract    | True           | 4            |  3,052.5 ns |  60.07 ns |  59.00 ns |  1.00 |    0.00 |  1.5640 | 0.0648 |   19648 B |        1.00 |
| ExtractNew | True           | 4            |  1,972.3 ns |  38.22 ns |  35.75 ns |  0.65 |    0.03 |  0.3471 | 0.0038 |    4360 B |        0.22 |
|            |                |              |             |           |           |       |         |         |        |           |             |
| Extract    | True           | 32           | 20,789.8 ns | 409.46 ns | 438.11 ns |  1.00 |    0.00 | 10.6812 | 2.1057 |  134392 B |        1.00 |
| ExtractNew | True           | 32           | 15,939.3 ns | 313.31 ns | 505.94 ns |  0.76 |    0.03 |  2.6245 | 0.3204 |   33147 B |        0.25 |

Merge requirement checklist

  • CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes
  • [ ] Changes in public API reviewed (if applicable)

@stevejgordon stevejgordon requested a review from a team July 15, 2024 15:55
@github-actions github-actions bot added pkg:OpenTelemetry.Api Issues related to OpenTelemetry.Api NuGet package pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package labels Jul 15, 2024
Copy link

codecov bot commented Jul 15, 2024

Codecov Report

Attention: Patch coverage is 94.73684% with 3 lines in your changes missing coverage. Please review.

Project coverage is 86.22%. Comparing base (6250307) to head (d43e26f).
Report is 288 commits behind head on main.

Files Patch % Lines
....Api/Context/Propagation/TraceContextPropagator.cs 94.73% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5749      +/-   ##
==========================================
+ Coverage   83.38%   86.22%   +2.84%     
==========================================
  Files         297      256      -41     
  Lines       12531    11140    -1391     
==========================================
- Hits        10449     9606     -843     
+ Misses       2082     1534     -548     
Flag Coverage Δ
unittests ?
unittests-Project-Experimental 86.17% <94.73%> (?)
unittests-Project-Stable 86.25% <94.73%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
....Api/Context/Propagation/TraceContextPropagator.cs 90.82% <94.73%> (+1.35%) ⬆️

... and 201 files with indirect coverage changes

Comment on lines 228 to 229
Span<char> buffer = stackalloc char[256];
Span<char> keyLookupBuffer = stackalloc char[96]; // 3 x 32 keys
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love what this PR is doing but I have some concern here with allocating 704 bytes on the stack ((256 + 96) * 2).

System.Text.Json for example will at max stackalloc 256 bytes (128 chars): https://github.com/dotnet/runtime/blob/a86987ccab917433d065fe5dc8870fc261f79d14/src/libraries/System.Text.Json/Common/JsonConstants.cs#L12-L13

Not sure why that number, but I'm sure a lot of thought went into it 😄 /cc @stephentoub

I think this pattern for "stackalloc with fallback to rented array" is solid but the version here seems a bit off from what I have seen. IMO it more commonly looks like this:

char[]? rentedBuffer = null;
Span<char> destination = length <= Constants.StackallocCharThreshold ?
   stackalloc char[Constants.StackallocCharThreshold] :
   (rentedBuffer = ArrayPool<char>.Shared.Rent(length));

try
{
   ...
}
finally
{
   if (rentedBuffer != null)
      ArrayPool<char>.Shared.Return(rentedBuffer );
}

Would something like that work/help simplify things here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why that number, but I'm sure a lot of thought went into it 😄

It's a bit squishy. We almost never go above 1K. We typically use 256 or 512 bytes, but it varies case-to-case based on knowledge of that particular location and how likely longer buffers are expected to be needed.

Copy link
Contributor Author

@stevejgordon stevejgordon Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeBlanch I think I'd seen somewhere that 1K was a "reasonable" max value. I went for 256 chars as, realistically, that should cover most trace state scenarios. We could drop to 128 (256B) instead, though, and still cover most shorter trace state strings. Combined with the 96 chars (192B) for the duplicate lookup, that might be more reasonable.

I can switch to the try/finally here. The Return method initially had some extra logic, but I refactored that before the PR. I left it in that form because I preferred avoiding one extra level of indentation introduced with the extra blocks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the pattern the try-finally can be ommited. In case of exceptions the buffer (if rented) will just be dropped instead of returned to the pool, but that isn't a problem to the pool.

In first incarnations of this pattern try-finally got used, but later on that pattern evolved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care so much about the try-finally we could drop that. Just want to make sure the stackalloc is good.

We can't really control where Extract runs so it is probably a good idea to be conservative. For incoming request (think AspNetCore instrumentation) it will probably be ~early. But something like processing messages from a queue, who knows! What I would really like to see is us avoid the stackalloc if we know there is a lot of tracestate but I guess hard to do because there could be multiple headers needing to be processed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeBlanch We could just be safe and go with ArrayPool. I preferred stackalloc since we might reasonably expect the state to be small, and we can avoid a small amount of overhead that we incur by renting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfoidl I think we need the finally here, though, as there are multiple branches where this code could return from this method on the non-exception path, and we need to ensure that the array is returned in those cases, too.

Copy link
Contributor

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added Stale Issues and pull requests which have been flagged for closing due to inactivity and removed Stale Issues and pull requests which have been flagged for closing due to inactivity labels Jul 31, 2024
@stevejgordon
Copy link
Contributor Author

@CodeBlanch, I am just bumping here to keep this open and see if we can solve the discussion on using stackalloc offline to move this along? I had hoped to attend the SIG to raise it, but the timing didn't work out again last night.

@CodeBlanch
Copy link
Member

Sorry for the delay on this @stevejgordon! We did look at this on the SIG yesterday. @vishweshbankwar will do a review pass when he has a moment and then we'll go from there.

Co-authored-by: Vishwesh Bankwar <vishweshbankwar@users.noreply.github.com>
@github-actions github-actions bot added the perf Performance related label Aug 8, 2024
Copy link
Member

@vishweshbankwar vishweshbankwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CodeBlanch CodeBlanch changed the title Optimise TraceContextPropagator.Extract [api] Optimise TraceContextPropagator.Extract Aug 12, 2024
@CodeBlanch CodeBlanch merged commit 83ecef8 into open-telemetry:main Aug 12, 2024
40 checks passed
@stevejgordon stevejgordon deleted the perf/tracecontextpropagator branch September 10, 2024 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf Performance related pkg:OpenTelemetry.Api Issues related to OpenTelemetry.Api NuGet package pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants