Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ekump/APMSP-1279 benchmark trace exporting #531

Merged
merged 12 commits into from
Jul 17, 2024

Conversation

ekump
Copy link
Contributor

@ekump ekump commented Jul 15, 2024

What does this PR do?

  1. Move some logic that resided in the sidecar for building a TracerPayloadCollection from msgpack bytes and TracerHeaderTags into the trace_utils crate as this logic can / should be used anywhere TracerPayloadCollection is used. No changes to the logic were made. This logic includes the deserialization of traces in msgpack format into an internal representation of traces (currently pb). Previously, this functionality wasn't directly covered under unit tests. Now it is.
  2. Add a micro-benchmark test that covers the creation of TracerPayloadCollections for v04 traces.

Motivation

There is a known problem with excessive allocations when processing traces through libdatadog that is currently blocking the sidecar from being turned on by default for PHP customers. This problem also blocks adoption of data-pipeline by .NET and Node libraries. These benchmarks should help establish a baseline for forthcoming work to improve performance. By moving the deserialization of msgpack closer to the TracerPayloadCollection implementation it ensures common encapsulated logic for both data-pipeline and SIdecar.

Additional Notes

This implementation currently only works for v04. Work to support v05 and v07 should follow.

How to test the change?

Unit tests added.
benchmarks can be run via cargo bench --package datadog-trace-utils

@codecov-commenter
Copy link

codecov-commenter commented Jul 15, 2024

Codecov Report

Attention: Patch coverage is 86.82927% with 27 lines in your changes missing coverage. Please review.

Project coverage is 70.42%. Comparing base (63d4ec8) to head (fda4ffa).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #531      +/-   ##
==========================================
+ Coverage   70.29%   70.42%   +0.13%     
==========================================
  Files         206      206              
  Lines       27818    27964     +146     
==========================================
+ Hits        19554    19694     +140     
- Misses       8264     8270       +6     
Components Coverage Δ
crashtracker 16.87% <ø> (ø)
datadog-alloc 98.73% <ø> (ø)
data-pipeline 51.15% <0.00%> (ø)
data-pipeline-ffi 0.00% <ø> (ø)
ddcommon 86.41% <ø> (ø)
ddcommon-ffi 75.31% <ø> (ø)
ddtelemetry 59.02% <ø> (ø)
ipc 84.13% <ø> (ø)
profiling 78.68% <ø> (ø)
profiling-ffi 58.26% <ø> (ø)
serverless 0.00% <ø> (ø)
sidecar 35.70% <0.00%> (+0.05%) ⬆️
sidecar-ffi 0.00% <ø> (ø)
spawn-worker 54.98% <ø> (ø)
trace-mini-agent 71.04% <100.00%> (+0.10%) ⬆️
trace-normalization 98.24% <ø> (ø)
trace-obfuscation 95.73% <ø> (ø)
trace-protobuf 77.16% <ø> (ø)
trace-utils 90.75% <91.95%> (+0.07%) ⬆️

@ekump ekump force-pushed the ekump/APMSP-1279-benchmark-trace-exporting branch from a67c5ed to 1709ecc Compare July 15, 2024 19:17
@ekump ekump marked this pull request as ready for review July 15, 2024 19:19
@ekump ekump requested review from a team as code owners July 15, 2024 19:19
@ekump ekump force-pushed the ekump/APMSP-1279-benchmark-trace-exporting branch 2 times, most recently from 7acf063 to bbc9f26 Compare July 15, 2024 19:37
@ekump ekump requested a review from a team as a code owner July 15, 2024 19:54
@pr-commenter
Copy link

pr-commenter bot commented Jul 15, 2024

Benchmarks

This comment was omitted because it was over 65536 characters.Please check the Gitlab Job logs to see its output.

@@ -106,14 +105,140 @@ impl TracerPayloadCollection {
}
}

pub type TracerPayloadChunkProcessor<'a> = Box<dyn Fn(&mut pb::TraceChunk, usize) + 'a>;
Copy link
Contributor

@bwoebi bwoebi Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proper way to implement this is having a trait TracerPayloadChunkProcessor { fn process(&mut pb::TraceChunk, usize); } and adding a PhantomData<T> to TracerPayloadParams with T: TracerPayloadChunkProcessor.

This also avoids allows rust fully compiling it away if not used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also avoids rust fully compiling it away if not used.

Could you elaborate on why that is beneficial?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant allows, not avoids.

Saving an extra memory allocation and call mostly. Nothing major, just idiomatic and a tiny bit better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to prefer static dispatch. Using boxed fn is a bit like Virtual (dynamic) method dispatch in C++ - useful when polymorphism is required but adding overhead.

Since this is part of trace chunk processing that can happen in a tight loop.

In addition to extra instructions it will have higher chance to incur a page fault whenever processor is called.
Ideally we should keep all code needed to do chunk processing within the CPU cache when possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwoebi Ah ok, that makes more sense.

I agree, using dynamic dispatch is not optimal. I was just trying to limit the blast radius of the refactor and maintain the same function signature for trace_utils::collect_trace_chunks.

I didn't want to commit the changes to this branch yet as I still need to do a bunch of cleanup as well as fix tests. But is this what you had in mind? It uses a trait as you suggested and avoids the need for any dynamic dispatch. There is no need for PhantomData though. If this makes sense, I'll incorporate it into this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that works too :-)

@@ -157,4 +159,4 @@ RUN ./build-profiling-ffi.sh /build/output

FROM scratch as ffi_build_output

COPY --from=ffi_build /build/output/ ./
COPY --from=ffi_build /build/output/ ./
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitt: missing newline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get my IDE configured correctly one of these days


[features]
test-utils = ["httpmock", "testcontainers", "cargo_metadata"]
test-utils = ["httpmock", "testcontainers", "cargo_metadata"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitt: missing newline

#[cfg(test)]
mod tests {
use super::*;
use crate::test_utils::create_test_span;
use datadog_trace_protobuf::pb::TraceChunk;
use datadog_trace_protobuf::pb::{Span, TraceChunk};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you remove the import of Span everywhere except here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was not intentional, will fix.

tracer_header_tags: &'a TracerHeaderTags<'a>,
/// A boxed closure that processes each `TraceChunk`, allowing for custom logic to be applied
/// during the conversion process.
process_chunk: TracerPayloadChunkProcessor<'a>,
Copy link
Contributor

@VianneyRuhlmann VianneyRuhlmann Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we often use a ChunkProcessor ? If not it feels a bit heavy to have to create an empty closure every time we call it maybe it can be an Option or provide an alternative constructor which creates the empty closure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. It currently only works for v07 and is only used by the mini-agent. For now, I created a DefaultTraceChunkProcessor that can be passed so we don't have to implement the trait just to do a no-op. The compiler should optimize away the call to the processor in collect_trace_chunks in this situation.

I'm inclined to leave it as is for now. I think we may want to do some further refactoring to TracerPayloads in the not to distant future and we may want to think of ways to streamline how some of this works.

start: 1,
duration: 5,
error: 0,
meta: HashMap::new(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR. But we whould generally avoid using Hashmaps for data which doesn't require it.

e.g. in many tracers we replaced HashMap for meta and metrics with a list of key,values - for internal representation - because it avoid a lot of unnecessary memory operations. And is generally alot faster since the map is very rarely looked up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about that, but at least PHP does quite a bunch of meta/metrics lookups at serialization time.
Also, e.g. sampling can be tag based.
And there are also quite a bunch of tags which are tied to state, like sampling, decision makers etc. which end up requiring lookups.
I doubt that a key-value list would bring tangible benefits over a solid HashTable impl.

ekump added 11 commits July 17, 2024 15:37
into TracerPayload.

This functionality to turn msgpack bytes and header tags into a
TracerPayloadCollection should live somewhere more common than the sidecar as it will need to
also be used by data-pipeline (and presumably agentless). It only supports v04 at the moment, but
should eventually support v05 and v07 as well.
Deserializing msgpack bytes into an internal representation is currently
inefficient. Before work starts on fixing the performance issues we want
to set some baseline benchmarks.
@ekump ekump force-pushed the ekump/APMSP-1279-benchmark-trace-exporting branch from 4d97808 to 33799c5 Compare July 17, 2024 19:37
Co-authored-by: Bob Weinand <bob.weinand@datadoghq.com>
@ekump ekump merged commit f07597b into main Jul 17, 2024
32 checks passed
@ekump ekump deleted the ekump/APMSP-1279-benchmark-trace-exporting branch July 17, 2024 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants