-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ekump/APMSP-1279 benchmark trace exporting #531
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #531 +/- ##
==========================================
+ Coverage 70.29% 70.42% +0.13%
==========================================
Files 206 206
Lines 27818 27964 +146
==========================================
+ Hits 19554 19694 +140
- Misses 8264 8270 +6
|
a67c5ed
to
1709ecc
Compare
7acf063
to
bbc9f26
Compare
BenchmarksThis comment was omitted because it was over 65536 characters.Please check the Gitlab Job logs to see its output. |
trace-utils/src/tracer_payload.rs
Outdated
@@ -106,14 +105,140 @@ impl TracerPayloadCollection { | |||
} | |||
} | |||
|
|||
pub type TracerPayloadChunkProcessor<'a> = Box<dyn Fn(&mut pb::TraceChunk, usize) + 'a>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proper way to implement this is having a trait TracerPayloadChunkProcessor { fn process(&mut pb::TraceChunk, usize); }
and adding a PhantomData<T>
to TracerPayloadParams with T: TracerPayloadChunkProcessor
.
This also avoids allows rust fully compiling it away if not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also avoids rust fully compiling it away if not used.
Could you elaborate on why that is beneficial?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant allows, not avoids.
Saving an extra memory allocation and call mostly. Nothing major, just idiomatic and a tiny bit better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to prefer static dispatch. Using boxed fn is a bit like Virtual (dynamic) method dispatch in C++ - useful when polymorphism is required but adding overhead.
Since this is part of trace chunk processing that can happen in a tight loop.
In addition to extra instructions it will have higher chance to incur a page fault whenever processor is called.
Ideally we should keep all code needed to do chunk processing within the CPU cache when possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bwoebi Ah ok, that makes more sense.
I agree, using dynamic dispatch is not optimal. I was just trying to limit the blast radius of the refactor and maintain the same function signature for trace_utils::collect_trace_chunks
.
I didn't want to commit the changes to this branch yet as I still need to do a bunch of cleanup as well as fix tests. But is this what you had in mind? It uses a trait as you suggested and avoids the need for any dynamic dispatch. There is no need for PhantomData
though. If this makes sense, I'll incorporate it into this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that works too :-)
tools/docker/Dockerfile.build
Outdated
@@ -157,4 +159,4 @@ RUN ./build-profiling-ffi.sh /build/output | |||
|
|||
FROM scratch as ffi_build_output | |||
|
|||
COPY --from=ffi_build /build/output/ ./ | |||
COPY --from=ffi_build /build/output/ ./ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitt: missing newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll get my IDE configured correctly one of these days
trace-utils/Cargo.toml
Outdated
|
||
[features] | ||
test-utils = ["httpmock", "testcontainers", "cargo_metadata"] | ||
test-utils = ["httpmock", "testcontainers", "cargo_metadata"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitt: missing newline
trace-utils/src/tracer_payload.rs
Outdated
#[cfg(test)] | ||
mod tests { | ||
use super::*; | ||
use crate::test_utils::create_test_span; | ||
use datadog_trace_protobuf::pb::TraceChunk; | ||
use datadog_trace_protobuf::pb::{Span, TraceChunk}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you remove the import of Span
everywhere except here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was not intentional, will fix.
trace-utils/src/tracer_payload.rs
Outdated
tracer_header_tags: &'a TracerHeaderTags<'a>, | ||
/// A boxed closure that processes each `TraceChunk`, allowing for custom logic to be applied | ||
/// during the conversion process. | ||
process_chunk: TracerPayloadChunkProcessor<'a>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we often use a ChunkProcessor ? If not it feels a bit heavy to have to create an empty closure every time we call it maybe it can be an Option or provide an alternative constructor which creates the empty closure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. It currently only works for v07
and is only used by the mini-agent. For now, I created a DefaultTraceChunkProcessor
that can be passed so we don't have to implement the trait just to do a no-op. The compiler should optimize away the call to the processor in collect_trace_chunks
in this situation.
I'm inclined to leave it as is for now. I think we may want to do some further refactoring to TracerPayloads in the not to distant future and we may want to think of ways to streamline how some of this works.
start: 1, | ||
duration: 5, | ||
error: 0, | ||
meta: HashMap::new(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR. But we whould generally avoid using Hashmaps for data which doesn't require it.
e.g. in many tracers we replaced HashMap for meta and metrics with a list of key,values - for internal representation - because it avoid a lot of unnecessary memory operations. And is generally alot faster since the map is very rarely looked up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about that, but at least PHP does quite a bunch of meta/metrics lookups at serialization time.
Also, e.g. sampling can be tag based.
And there are also quite a bunch of tags which are tied to state, like sampling, decision makers etc. which end up requiring lookups.
I doubt that a key-value list would bring tangible benefits over a solid HashTable impl.
into TracerPayload. This functionality to turn msgpack bytes and header tags into a TracerPayloadCollection should live somewhere more common than the sidecar as it will need to also be used by data-pipeline (and presumably agentless). It only supports v04 at the moment, but should eventually support v05 and v07 as well.
Deserializing msgpack bytes into an internal representation is currently inefficient. Before work starts on fixing the performance issues we want to set some baseline benchmarks.
4d97808
to
33799c5
Compare
Co-authored-by: Bob Weinand <bob.weinand@datadoghq.com>
What does this PR do?
TracerPayloadCollection
from msgpack bytes andTracerHeaderTags
into thetrace_utils
crate as this logic can / should be used anywhereTracerPayloadCollection
is used. No changes to the logic were made. This logic includes the deserialization of traces in msgpack format into an internal representation of traces (currently pb). Previously, this functionality wasn't directly covered under unit tests. Now it is.TracerPayloadCollections
for v04 traces.Motivation
There is a known problem with excessive allocations when processing traces through libdatadog that is currently blocking the sidecar from being turned on by default for PHP customers. This problem also blocks adoption of data-pipeline by .NET and Node libraries. These benchmarks should help establish a baseline for forthcoming work to improve performance. By moving the deserialization of msgpack closer to the TracerPayloadCollection implementation it ensures common encapsulated logic for both data-pipeline and SIdecar.
Additional Notes
This implementation currently only works for v04. Work to support v05 and v07 should follow.
How to test the change?
Unit tests added.
benchmarks can be run via
cargo bench --package datadog-trace-utils