-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trace Normalizer: add service + env tag normalization #106
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, left some questions and nits
} | ||
continue; | ||
} | ||
if cur_char.is_alphabetic() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to think about here, is that is_alphabetic uses the unicode definition of alphabetic characters, so includes characters from all alphabets, not just ascii. Whereas this helper function used elsewhere is only checking for ascii alpha characters.
https://github.com/DataDog/libdatadog/blob/main/trace-normalization/src/normalize_utils.rs#L162
Our documentation only mentions alphanumerics:
What does the go trace agent do here? We should try to be consistent in which method we use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to mirror exactly what the go tag normalizer does, which:
- uses
isNormalizedASCIITag
as a fast path (here)- Only considers a lowercase ascii alpha as a valid starting char
- only sees ascii alphas as valid (so no other languages)
- If the fast path fails, normalization will happen which allows chars from all alphabets
The helper function you linked is used in normalize_metric_names
, who's go counterpart normMetricNameParse also just uses the ASCII check
} | ||
|
||
pub(crate) fn is_valid_ascii_start_char(c: char) -> bool { | ||
('a'..='z').contains(&c) || c == ':' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same deal here, also, are only lower case characters valid as a start character?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thats correct (corresponding function in the go agent normalizer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some stylistic suggestions. I'm happy to pair again if you want!
for mut i in 0..tag.len() { | ||
let cur_char = match tag.chars().nth(i) { | ||
Some(c) => c, | ||
None => return false, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can pull the iterator out, and then iterate manually using .next()
instead this juggling. Then instead of calling .nth(i)
to get the 'current' value, it will just be there. And 'next' can be obtained by calling .next()
instead of .nth(i + 1)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implemented!
Co-authored-by: Levi Morrison <levi.morrison@datadoghq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from a generic Rust standpoint, as far as I can tell :)
commit 72af2a0 Author: David Lee <thedavl2001@gmail.com> Date: Mon Feb 13 15:44:57 2023 -0800 Trace Normalizer: add trace & trace chunk normalization (#109) commit 510472d Author: Pawel Chojnacki <pawelchcki@gmail.com> Date: Thu Feb 9 15:05:33 2023 +0100 Fixup build-telemetry-ffi and cbindgen.toml (#105) commit 711bd43 Author: David Lee <thedavl2001@gmail.com> Date: Wed Feb 8 15:18:24 2023 -0800 Trace Normalizer: add service + env tag normalization (#106) * add service tag + env tag normalization + unit tests commit ee4bba8 Author: Ivo Anjo <ivo.anjo@datadoghq.com> Date: Tue Feb 7 15:38:13 2023 +0000 Remove unneeded "gem signout" step from Ruby release (#108) **What does this PR do?**: This PR removes the now-unneeded "gem signout" steps during the Ruby release process. **Motivation**: In #85, we changed the way that we authenticate with rubygems.org when pushing a new libdatadog release. I just did a release with this new code, and noticed that because we no longer log in, but just use a limited API key, the "gem signout" does not do anything and emits an error. Here's what I saw when I ran `docker-compose run push_to_rubygems`: ``` ... preparation of packages goes here... ERROR: You are not currently signed in. Please input 'libdatadog ruby release key' from 'Profiling - Falcon' Datadog 1Password: (...key...) Pushing gem to https://rubygems.org... You have enabled multi-factor authentication. Please enter OTP code. Code: (...) Successfully registered gem: libdatadog (2.0.0.1.0) Pushing gem to https://rubygems.org... You have enabled multi-factor authentication. Please enter OTP code. Code: (...) Successfully registered gem: libdatadog (2.0.0.1.0-x86_64-linux) Pushing gem to https://rubygems.org... You have enabled multi-factor authentication. Please enter OTP code. Code: (...) Successfully registered gem: libdatadog (2.0.0.1.0-aarch64-linux) ERROR: You are not currently signed in. ``` Those two "ERROR: You are not currently signed in" come from the "gem signout" steps, and what's why I'm removing them. **Additional Notes**: (N/A) **How to test the change?**: You can run `docker-compose run push_to_rubygems` and validate the errors will not show up again. This is safe because Rubygems does not allow re-releasing the same packages. commit 8be5465 Author: Ivo Anjo <ivo.anjo@datadoghq.com> Date: Fri Feb 3 14:35:58 2023 +0000 Package libdatadog v2.0.0 for Ruby (#107) **What does this PR do?**: This PR includes the changes documented in the "Releasing a new version to rubygems.org" part of the README: <https://github.com/DataDog/libdatadog/tree/main/ruby#releasing-a-new-version-to-rubygemsorg> **Motivation**: Enable Ruby to use libdatadog v2.0.0. **Additional Notes**: (N/A) **How to test the change?**: This was locally checked with the Ruby profiler branch that already supports libdatadog 2. commit 73b8e2e Author: David Lee <thedavl2001@gmail.com> Date: Thu Feb 2 14:01:07 2023 -0800 Create a span normalizer skeleton, fully implement span name normalization (#100) commit 097ea5d Author: Levi Morrison <levi.morrison@datadoghq.com> Date: Thu Feb 2 10:32:39 2023 -0700 refactor(profiling)!: less chance of request double free (#103) commit e3887c3 Author: Pawel Chojnacki <pawelchcki@gmail.com> Date: Mon Jan 30 18:06:46 2023 +0100 Fix CI warnings (#104) * Fix warnings * clippy fix commit 19b7f69 Author: Levi Morrison <levi.morrison@datadoghq.com> Date: Fri Jan 27 10:34:37 2023 -0700 refactor(profiling)!: create FFI Error type and remove `*Result_drop` methods (#95) * refactor(profiling)!: create FFI Error type This extracts an FFI Error type `ddog_Error`. It contains an FFI Vec internally: ```rust /// Please treat this as opaque; do not reach into it, and especially /// don't write into it! pub struct Error { /// This is a String stuffed into the vec. message: Vec<u8>, } ``` FFI result types have been updated e.g.: ```rust pub enum SendResult { HttpResponse(HttpStatus), Err(Error), } ``` Instead of reaching into the buffer, use these APIs: ```c /** * # Safety * Only pass null or a valid reference to an Error. */ void ddog_Error_drop(struct ddog_Error *error); /** * Returns a CharSlice of the error's message that is valid until the error * is dropped. * # Safety * Only pass null or a valid reference to an Error. */ ddog_CharSlice ddog_Error_message(const struct ddog_Error *error); ``` * Drop *Result_drop functions The `*Result_drop` methods have been removed: - ddog_prof_Exporter_NewResult_drop - ddog_prof_Exporter_Request_BuildResult_drop - ddog_prof_Exporter_SendResult_drop - ddog_prof_Profile_AddResult_drop - ddog_prof_Profile_SerializeResult_drop - ddog_Vec_Tag_PushResult_drop And these were added instead: - ddog_Error_drop (+ ddog_Error_message to get the message) - ddog_prof_EncodedProfile_drop - ddog_prof_Exporter_Request_drop * Make a note about #[must_use] * Add ddog_prof_EncodedProfile_drop Co-authored-by: Ivo Anjo <ivo.anjo@datadoghq.com> commit ed1ee92 Author: Florian Engelhardt <florian.engelhardt@datadoghq.com> Date: Fri Jan 27 17:19:30 2023 +0100 fix link to contribution guide (#102) commit 1c445fe Author: Levi Morrison <levi.morrison@datadoghq.com> Date: Fri Jan 27 09:02:54 2023 -0700 feat(build-profiling-ffi.sh): support CARGO_TARGET_DIR (#101) commit 88899ba Author: Levi Morrison <levi.morrison@datadoghq.com> Date: Fri Jan 27 08:43:11 2023 -0700 fix: clippy lints from Rust v1.67.0 release (#99) * fix clippy::uninlined_format_args * fix clippy::seek_to_start_instead_of_rewind commit f2d0ed0 Author: Levi Morrison <levi.morrison@datadoghq.com> Date: Wed Jan 25 15:32:33 2023 -0700 feat(profiling)!: pass errors through more FFI functions (#90) This changes the return types for FFI functions: - `ddog_prof_Profile_add` - `ddog_prof_Exporter_Request_build` Adds new structs: - `ddog_prof_Profile_AddResult` - `ddog_prof_Exporter_Request_BuildResult` And adds functions to drop them: - `ddog_prof_Profile_AddResult_drop` - `ddog_prof_Exporter_Request_BuildResult_drop` Removes a now-unnecessary newtype definition of struct `Request(exporter::Request)` as cbindgen handles the refactored code. commit c93b7c1 Author: Levi Morrison <levi.morrison@datadoghq.com> Date: Mon Jan 23 10:56:21 2023 -0700 chore: update dependencies (#96) * chore: update dependencies This fixes a dependabot alert: https://github.com/DataDog/libdatadog/security/dependabot/6 With the updated dependencies, there was a new deprecation: > warning: use of deprecated associated function > `chrono::TimeZone::timestamp`: use `timestamp_opt()` instead I replaced it with `timestamp_opt` and unwrapped it, which is exactly what the now-deprecated `timestamp` function does: https://github.com/chronotope/chrono/blob/378efb157d674c01761f538d4450705c2b1766a4/src/offset/mod.rs#L343 They deprecated it because they are working to not panic internally. * Bump LICENSE-3rdparty.yml Merge branch 'main' into origin/david.lee/agentless-use-trace-normalizer
What does this PR do?
Adds service tag and env tag normalization to the normalizer + associated unit tests.
This normalizer is modeled very closely after the existing agent trace normalizer, which can be found here.
The next PR will cover:
normalize_trace
normalize_chunk
These two last functions in the normalizer are much shorter than
normalize
Motivation
The trace normalizer will be used in our serverless agentless POC, which will use it to normalize traces before being sent directly to the datadog trace intake.
See: the serverless branch
Additional Notes
How to test the change?
Describe here in detail how the change can be validated.
The changes have been manually tested using the agentless POC, normalizing span names and verifying the normalization works as expected. Unit tests have been copied/translated from the agent normalizer unit tests written in go.