New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Trace Normalizer: add service + env tag normalization #106

Merged

thedavl merged 18 commits into main from david.lee/trace-normalize-service-name

Feb 8, 2023

Contributor

thedavl commented Feb 2, 2023 •

edited

Loading

What does this PR do?

Adds service tag and env tag normalization to the normalizer + associated unit tests.

This normalizer is modeled very closely after the existing agent trace normalizer, which can be found here.

The next PR will cover:

normalize_trace
normalize_chunk

These two last functions in the normalizer are much shorter than normalize

Motivation

The trace normalizer will be used in our serverless agentless POC, which will use it to normalize traces before being sent directly to the datadog trace intake.

See: the serverless branch

Additional Notes

Tags allow non ascii alpha characters, while metrics names don't

How to test the change?

Describe here in detail how the change can be validated.

The changes have been manually tested using the agentless POC, normalizing span names and verifying the normalization works as expected. Unit tests have been copied/translated from the agent normalizer unit tests written in go.

thedavl added 4 commits

February 2, 2023 15:09


          add service tag + env tag normalization + unit tests

ab26e7b


          Update normalizer.rs

38d0c62


          lint

4058d5c


          remove .unwrap() outside of unit tests

2eb2940

thedavl changed the title ~~add service tag + env tag normalization + unit tests~~ Trace Normalizer: add service + env tag normalization

thedavl and others added 6 commits

February 6, 2023 08:14


          Merge branch 'main' into david.lee/trace-normalize-service-name

0ae59a7


          change fallback_service to take an &str

4236bd5


          modify fallback service

aca86be


          lint

263e106


          remove more .unwrap()

659c309


          add mountain of tag normalization unit tests

0e35bbd

thedavl marked this pull request as ready for review

February 6, 2023 18:25

thedavl requested a review from a team as a code owner

February 6, 2023 18:25

thedavl requested a review from morrisonlevi

February 6, 2023 18:25

thedavl marked this pull request as draft

February 6, 2023 19:04

thedavl marked this pull request as ready for review

February 6, 2023 19:04

DarcyRaynerDD reviewed

View reviewed changes

DarcyRaynerDD left a comment

Nice work, left some questions and nits

trace-normalization/src/normalize_utils.rs Outdated Show resolved Hide resolved

trace-normalization/src/normalize_utils.rs Outdated Show resolved Hide resolved

trace-normalization/src/normalize_utils.rs Outdated Show resolved Hide resolved

trace-normalization/src/normalize_utils.rs Show resolved Hide resolved

trace-normalization/src/normalizer.rs Outdated Show resolved Hide resolved

trace-normalization/src/normalize_utils.rs

+                          }
+                          continue;
+                      }
+                      if cur_char.is_alphabetic() {

DarcyRaynerDD Feb 6, 2023

One thing to think about here, is that is_alphabetic uses the unicode definition of alphabetic characters, so includes characters from all alphabets, not just ascii. Whereas this helper function used elsewhere is only checking for ascii alpha characters.

https://github.com/DataDog/libdatadog/blob/main/trace-normalization/src/normalize_utils.rs#L162

Our documentation only mentions alphanumerics:

What does the go trace agent do here? We should try to be consistent in which method we use

Contributor Author

thedavl Feb 6, 2023 •

edited

Loading

I tried to mirror exactly what the go tag normalizer does, which:

uses isNormalizedASCIITag as a fast path (here)
- Only considers a lowercase ascii alpha as a valid starting char
- only sees ascii alphas as valid (so no other languages)
If the fast path fails, normalization will happen which allows chars from all alphabets

Here is the go tag normalizer

The helper function you linked is used in normalize_metric_names, who's go counterpart normMetricNameParse also just uses the ASCII check

trace-normalization/src/normalize_utils.rs

+              }
+              pub(crate) fn is_valid_ascii_start_char(c: char) -> bool {
+                  ('a'..='z').contains(&c) || c == ':'

DarcyRaynerDD Feb 6, 2023

Same deal here, also, are only lower case characters valid as a start character?

Contributor Author

thedavl Feb 6, 2023

yes, thats correct (corresponding function in the go agent normalizer)

thedavl and others added 4 commits

February 6, 2023 15:58


          address PR comments

12c80c2


          modify comment wording

7c3a31d


          remove let ... else

924c6d3


          Merge branch 'main' into david.lee/trace-normalize-service-name

be94c7a

thedavl requested a review from a team

February 7, 2023 19:57

morrisonlevi reviewed

View reviewed changes

Contributor

morrisonlevi left a comment

I have some stylistic suggestions. I'm happy to pair again if you want!

trace-normalization/src/normalize_utils.rs Outdated Show resolved Hide resolved

trace-normalization/src/normalize_utils.rs Outdated

Comment on lines 134 to 138

+                  for mut i in 0..tag.len() {
+                      let cur_char = match tag.chars().nth(i) {
+                          Some(c) => c,
+                          None => return false,
+                      };

Contributor

morrisonlevi Feb 8, 2023

I think you can pull the iterator out, and then iterate manually using .next() instead this juggling. Then instead of calling .nth(i) to get the 'current' value, it will just be there. And 'next' can be obtained by calling .next() instead of .nth(i + 1).

Contributor Author

thedavl Feb 8, 2023

implemented!

thedavl and others added 2 commits

February 8, 2023 14:37


          Update trace-normalization/src/normalize_utils.rs

8b1f91a

Co-authored-by: Levi Morrison <levi.morrison@datadoghq.com>


          address PR comments

8be8691

thedavl requested a review from morrisonlevi

February 8, 2023 22:46

thedavl added 2 commits

February 8, 2023 14:55


          remove .peekable() in is_normalized_ascii_tag func

912f026


          Update normalize_utils.rs

0c8de51

morrisonlevi approved these changes

View reviewed changes

Contributor

morrisonlevi left a comment

Looks good from a generic Rust standpoint, as far as I can tell :)

thedavl merged commit 711bd43 into main

thedavl deleted the david.lee/trace-normalize-service-name branch

February 8, 2023 23:18

thedavl added a commit that referenced this pull request


          Squashed commit of the following:

6c24f99

commit 72af2a0
Author: David Lee <thedavl2001@gmail.com>
Date:   Mon Feb 13 15:44:57 2023 -0800

    Trace Normalizer: add trace & trace chunk normalization (#109)

commit 510472d
Author: Pawel Chojnacki <pawelchcki@gmail.com>
Date:   Thu Feb 9 15:05:33 2023 +0100

    Fixup build-telemetry-ffi and cbindgen.toml (#105)

commit 711bd43
Author: David Lee <thedavl2001@gmail.com>
Date:   Wed Feb 8 15:18:24 2023 -0800

    Trace Normalizer: add service + env tag normalization (#106)

    * add service tag + env tag normalization + unit tests

commit ee4bba8
Author: Ivo Anjo <ivo.anjo@datadoghq.com>
Date:   Tue Feb 7 15:38:13 2023 +0000

    Remove unneeded "gem signout" step from Ruby release (#108)

    **What does this PR do?**:

    This PR removes the now-unneeded "gem signout" steps during the Ruby
    release process.

    **Motivation**:

    In #85, we changed the way that we authenticate with rubygems.org when
    pushing a new libdatadog release.

    I just did a release with this new code, and noticed that because
    we no longer log in, but just use a limited API key, the "gem signout"
    does not do anything and emits an error.

    Here's what I saw when I ran `docker-compose run push_to_rubygems`:

    ```
    ... preparation of packages goes here...

    ERROR:  You are not currently signed in.
    Please input 'libdatadog ruby release key' from 'Profiling - Falcon' Datadog 1Password:
    (...key...)
    Pushing gem to https://rubygems.org...
    You have enabled multi-factor authentication. Please enter OTP code.
    Code:   (...)
    Successfully registered gem: libdatadog (2.0.0.1.0)
    Pushing gem to https://rubygems.org...
    You have enabled multi-factor authentication. Please enter OTP code.
    Code:   (...)
    Successfully registered gem: libdatadog (2.0.0.1.0-x86_64-linux)
    Pushing gem to https://rubygems.org...
    You have enabled multi-factor authentication. Please enter OTP code.
    Code:   (...)
    Successfully registered gem: libdatadog (2.0.0.1.0-aarch64-linux)
    ERROR:  You are not currently signed in.
    ```

    Those two "ERROR:  You are not currently signed in" come from the
    "gem signout" steps, and what's why I'm removing them.

    **Additional Notes**:

    (N/A)

    **How to test the change?**:

    You can run `docker-compose run push_to_rubygems` and validate
    the errors will not show up again. This is safe because Rubygems
    does not allow re-releasing the same packages.

commit 8be5465
Author: Ivo Anjo <ivo.anjo@datadoghq.com>
Date:   Fri Feb 3 14:35:58 2023 +0000

    Package libdatadog v2.0.0 for Ruby (#107)

    **What does this PR do?**:

    This PR includes the changes documented in the
    "Releasing a new version to rubygems.org" part of the README:
    <https://github.com/DataDog/libdatadog/tree/main/ruby#releasing-a-new-version-to-rubygemsorg>

    **Motivation**:

    Enable Ruby to use libdatadog v2.0.0.

    **Additional Notes**:

    (N/A)

    **How to test the change?**:

    This was locally checked with the Ruby profiler branch that already
    supports libdatadog 2.

commit 73b8e2e
Author: David Lee <thedavl2001@gmail.com>
Date:   Thu Feb 2 14:01:07 2023 -0800

    Create a span normalizer skeleton, fully implement span name normalization (#100)

commit 097ea5d
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Thu Feb 2 10:32:39 2023 -0700

    refactor(profiling)!: less chance of request double free (#103)

commit e3887c3
Author: Pawel Chojnacki <pawelchcki@gmail.com>
Date:   Mon Jan 30 18:06:46 2023 +0100

    Fix CI warnings (#104)

    * Fix warnings

    * clippy fix

commit 19b7f69
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Fri Jan 27 10:34:37 2023 -0700

    refactor(profiling)!: create FFI Error type and remove `*Result_drop` methods (#95)

    * refactor(profiling)!: create FFI Error type

    This extracts an FFI Error type `ddog_Error`. It contains an FFI Vec
    internally:

    ```rust
    /// Please treat this as opaque; do not reach into it, and especially
    /// don't write into it!
    pub struct Error {
        /// This is a String stuffed into the vec.
        message: Vec<u8>,
    }
    ```

    FFI result types have been updated e.g.:

    ```rust
    pub enum SendResult {
        HttpResponse(HttpStatus),
        Err(Error),
    }
    ```

    Instead of reaching into the buffer, use these APIs:

    ```c
    /**
     * # Safety
     * Only pass null or a valid reference to an Error.
     */
    void ddog_Error_drop(struct ddog_Error *error);

    /**
     * Returns a CharSlice of the error's message that is valid until the error
     * is dropped.
     * # Safety
     * Only pass null or a valid reference to an Error.
     */
    ddog_CharSlice ddog_Error_message(const struct ddog_Error *error);
    ```

    * Drop *Result_drop functions

    The `*Result_drop` methods have been removed:

    - ddog_prof_Exporter_NewResult_drop
    - ddog_prof_Exporter_Request_BuildResult_drop
    - ddog_prof_Exporter_SendResult_drop
    - ddog_prof_Profile_AddResult_drop
    - ddog_prof_Profile_SerializeResult_drop
    - ddog_Vec_Tag_PushResult_drop

    And these were added instead:

    - ddog_Error_drop (+ ddog_Error_message to get the message)
    - ddog_prof_EncodedProfile_drop
    - ddog_prof_Exporter_Request_drop
    * Make a note about #[must_use]

    * Add ddog_prof_EncodedProfile_drop

    Co-authored-by: Ivo Anjo <ivo.anjo@datadoghq.com>

commit ed1ee92
Author: Florian Engelhardt <florian.engelhardt@datadoghq.com>
Date:   Fri Jan 27 17:19:30 2023 +0100

    fix link to contribution guide (#102)

commit 1c445fe
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Fri Jan 27 09:02:54 2023 -0700

    feat(build-profiling-ffi.sh): support CARGO_TARGET_DIR (#101)

commit 88899ba
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Fri Jan 27 08:43:11 2023 -0700

    fix: clippy lints from Rust v1.67.0 release (#99)

    * fix clippy::uninlined_format_args

    * fix clippy::seek_to_start_instead_of_rewind

commit f2d0ed0
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Wed Jan 25 15:32:33 2023 -0700

    feat(profiling)!: pass errors through more FFI functions (#90)

    This changes the return types for FFI functions:
     - `ddog_prof_Profile_add`
     - `ddog_prof_Exporter_Request_build`

    Adds new structs:
     - `ddog_prof_Profile_AddResult`
     - `ddog_prof_Exporter_Request_BuildResult`

    And adds functions to drop them:
     - `ddog_prof_Profile_AddResult_drop`
     - `ddog_prof_Exporter_Request_BuildResult_drop`

    Removes a now-unnecessary newtype definition of struct
    `Request(exporter::Request)` as cbindgen handles the refactored
    code.

commit c93b7c1
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Mon Jan 23 10:56:21 2023 -0700

    chore: update dependencies (#96)

    * chore: update dependencies

    This fixes a dependabot alert:
    https://github.com/DataDog/libdatadog/security/dependabot/6

    With the updated dependencies, there was a new deprecation:

    > warning: use of deprecated associated function
    > `chrono::TimeZone::timestamp`: use `timestamp_opt()` instead

    I replaced it with `timestamp_opt` and unwrapped it, which is exactly
    what the now-deprecated `timestamp` function does:
    https://github.com/chronotope/chrono/blob/378efb157d674c01761f538d4450705c2b1766a4/src/offset/mod.rs#L343

    They deprecated it because they are working to not panic internally.

    * Bump LICENSE-3rdparty.yml
Merge branch 'main' into origin/david.lee/agentless-use-trace-normalizer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet