Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace Normalizer: add service + env tag normalization #106

Merged
merged 18 commits into from
Feb 8, 2023

Conversation

thedavl
Copy link
Contributor

@thedavl thedavl commented Feb 2, 2023

What does this PR do?

Adds service tag and env tag normalization to the normalizer + associated unit tests.

This normalizer is modeled very closely after the existing agent trace normalizer, which can be found here.

The next PR will cover:

  • normalize_trace
  • normalize_chunk

These two last functions in the normalizer are much shorter than normalize

Motivation

The trace normalizer will be used in our serverless agentless POC, which will use it to normalize traces before being sent directly to the datadog trace intake.

See: the serverless branch

Additional Notes

  • Tags allow non ascii alpha characters, while metrics names don't

How to test the change?

Describe here in detail how the change can be validated.

The changes have been manually tested using the agentless POC, normalizing span names and verifying the normalization works as expected. Unit tests have been copied/translated from the agent normalizer unit tests written in go.

@thedavl thedavl changed the title add service tag + env tag normalization + unit tests Trace Normalizer: add service + env tag normalization Feb 3, 2023
@thedavl thedavl marked this pull request as ready for review February 6, 2023 18:25
@thedavl thedavl requested a review from a team as a code owner February 6, 2023 18:25
@thedavl thedavl marked this pull request as draft February 6, 2023 19:04
@thedavl thedavl marked this pull request as ready for review February 6, 2023 19:04
Copy link

@DarcyRaynerDD DarcyRaynerDD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, left some questions and nits

trace-normalization/src/normalize_utils.rs Outdated Show resolved Hide resolved
trace-normalization/src/normalize_utils.rs Outdated Show resolved Hide resolved
trace-normalization/src/normalize_utils.rs Outdated Show resolved Hide resolved
trace-normalization/src/normalize_utils.rs Show resolved Hide resolved
trace-normalization/src/normalizer.rs Outdated Show resolved Hide resolved
}
continue;
}
if cur_char.is_alphabetic() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to think about here, is that is_alphabetic uses the unicode definition of alphabetic characters, so includes characters from all alphabets, not just ascii. Whereas this helper function used elsewhere is only checking for ascii alpha characters.

https://github.com/DataDog/libdatadog/blob/main/trace-normalization/src/normalize_utils.rs#L162

Our documentation only mentions alphanumerics:

What does the go trace agent do here? We should try to be consistent in which method we use

Copy link
Contributor Author

@thedavl thedavl Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to mirror exactly what the go tag normalizer does, which:

  • uses isNormalizedASCIITag as a fast path (here)
    • Only considers a lowercase ascii alpha as a valid starting char
    • only sees ascii alphas as valid (so no other languages)
  • If the fast path fails, normalization will happen which allows chars from all alphabets

Here is the go tag normalizer

The helper function you linked is used in normalize_metric_names, who's go counterpart normMetricNameParse also just uses the ASCII check

}

pub(crate) fn is_valid_ascii_start_char(c: char) -> bool {
('a'..='z').contains(&c) || c == ':'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same deal here, also, are only lower case characters valid as a start character?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thedavl thedavl requested a review from a team February 7, 2023 19:57
Copy link
Contributor

@morrisonlevi morrisonlevi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some stylistic suggestions. I'm happy to pair again if you want!

trace-normalization/src/normalize_utils.rs Outdated Show resolved Hide resolved
Comment on lines 134 to 138
for mut i in 0..tag.len() {
let cur_char = match tag.chars().nth(i) {
Some(c) => c,
None => return false,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can pull the iterator out, and then iterate manually using .next() instead this juggling. Then instead of calling .nth(i) to get the 'current' value, it will just be there. And 'next' can be obtained by calling .next() instead of .nth(i + 1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented!

thedavl and others added 2 commits February 8, 2023 14:37
Co-authored-by: Levi Morrison <levi.morrison@datadoghq.com>
Copy link
Contributor

@morrisonlevi morrisonlevi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from a generic Rust standpoint, as far as I can tell :)

@thedavl thedavl merged commit 711bd43 into main Feb 8, 2023
@thedavl thedavl deleted the david.lee/trace-normalize-service-name branch February 8, 2023 23:18
thedavl added a commit that referenced this pull request Feb 14, 2023
commit 72af2a0
Author: David Lee <thedavl2001@gmail.com>
Date:   Mon Feb 13 15:44:57 2023 -0800

    Trace Normalizer: add trace & trace chunk normalization (#109)

commit 510472d
Author: Pawel Chojnacki <pawelchcki@gmail.com>
Date:   Thu Feb 9 15:05:33 2023 +0100

    Fixup build-telemetry-ffi and cbindgen.toml (#105)

commit 711bd43
Author: David Lee <thedavl2001@gmail.com>
Date:   Wed Feb 8 15:18:24 2023 -0800

    Trace Normalizer: add service + env tag normalization (#106)

    * add service tag + env tag normalization + unit tests

commit ee4bba8
Author: Ivo Anjo <ivo.anjo@datadoghq.com>
Date:   Tue Feb 7 15:38:13 2023 +0000

    Remove unneeded "gem signout" step from Ruby release (#108)

    **What does this PR do?**:

    This PR removes the now-unneeded "gem signout" steps during the Ruby
    release process.

    **Motivation**:

    In #85, we changed the way that we authenticate with rubygems.org when
    pushing a new libdatadog release.

    I just did a release with this new code, and noticed that because
    we no longer log in, but just use a limited API key, the "gem signout"
    does not do anything and emits an error.

    Here's what I saw when I ran `docker-compose run push_to_rubygems`:

    ```
    ... preparation of packages goes here...

    ERROR:  You are not currently signed in.
    Please input 'libdatadog ruby release key' from 'Profiling - Falcon' Datadog 1Password:
    (...key...)
    Pushing gem to https://rubygems.org...
    You have enabled multi-factor authentication. Please enter OTP code.
    Code:   (...)
    Successfully registered gem: libdatadog (2.0.0.1.0)
    Pushing gem to https://rubygems.org...
    You have enabled multi-factor authentication. Please enter OTP code.
    Code:   (...)
    Successfully registered gem: libdatadog (2.0.0.1.0-x86_64-linux)
    Pushing gem to https://rubygems.org...
    You have enabled multi-factor authentication. Please enter OTP code.
    Code:   (...)
    Successfully registered gem: libdatadog (2.0.0.1.0-aarch64-linux)
    ERROR:  You are not currently signed in.
    ```

    Those two "ERROR:  You are not currently signed in" come from the
    "gem signout" steps, and what's why I'm removing them.

    **Additional Notes**:

    (N/A)

    **How to test the change?**:

    You can run `docker-compose run push_to_rubygems` and validate
    the errors will not show up again. This is safe because Rubygems
    does not allow re-releasing the same packages.

commit 8be5465
Author: Ivo Anjo <ivo.anjo@datadoghq.com>
Date:   Fri Feb 3 14:35:58 2023 +0000

    Package libdatadog v2.0.0 for Ruby (#107)

    **What does this PR do?**:

    This PR includes the changes documented in the
    "Releasing a new version to rubygems.org" part of the README:
    <https://github.com/DataDog/libdatadog/tree/main/ruby#releasing-a-new-version-to-rubygemsorg>

    **Motivation**:

    Enable Ruby to use libdatadog v2.0.0.

    **Additional Notes**:

    (N/A)

    **How to test the change?**:

    This was locally checked with the Ruby profiler branch that already
    supports libdatadog 2.

commit 73b8e2e
Author: David Lee <thedavl2001@gmail.com>
Date:   Thu Feb 2 14:01:07 2023 -0800

    Create a span normalizer skeleton, fully implement span name normalization (#100)

commit 097ea5d
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Thu Feb 2 10:32:39 2023 -0700

    refactor(profiling)!: less chance of request double free (#103)

commit e3887c3
Author: Pawel Chojnacki <pawelchcki@gmail.com>
Date:   Mon Jan 30 18:06:46 2023 +0100

    Fix CI warnings (#104)

    * Fix warnings

    * clippy fix

commit 19b7f69
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Fri Jan 27 10:34:37 2023 -0700

    refactor(profiling)!: create FFI Error type and remove `*Result_drop` methods (#95)

    * refactor(profiling)!: create FFI Error type

    This extracts an FFI Error type `ddog_Error`. It contains an FFI Vec
    internally:

    ```rust
    /// Please treat this as opaque; do not reach into it, and especially
    /// don't write into it!
    pub struct Error {
        /// This is a String stuffed into the vec.
        message: Vec<u8>,
    }
    ```

    FFI result types have been updated e.g.:

    ```rust
    pub enum SendResult {
        HttpResponse(HttpStatus),
        Err(Error),
    }
    ```

    Instead of reaching into the buffer, use these APIs:

    ```c
    /**
     * # Safety
     * Only pass null or a valid reference to an Error.
     */
    void ddog_Error_drop(struct ddog_Error *error);

    /**
     * Returns a CharSlice of the error's message that is valid until the error
     * is dropped.
     * # Safety
     * Only pass null or a valid reference to an Error.
     */
    ddog_CharSlice ddog_Error_message(const struct ddog_Error *error);
    ```

    * Drop *Result_drop functions

    The `*Result_drop` methods have been removed:

    - ddog_prof_Exporter_NewResult_drop
    - ddog_prof_Exporter_Request_BuildResult_drop
    - ddog_prof_Exporter_SendResult_drop
    - ddog_prof_Profile_AddResult_drop
    - ddog_prof_Profile_SerializeResult_drop
    - ddog_Vec_Tag_PushResult_drop

    And these were added instead:

    - ddog_Error_drop (+ ddog_Error_message to get the message)
    - ddog_prof_EncodedProfile_drop
    - ddog_prof_Exporter_Request_drop
    * Make a note about #[must_use]

    * Add ddog_prof_EncodedProfile_drop

    Co-authored-by: Ivo Anjo <ivo.anjo@datadoghq.com>

commit ed1ee92
Author: Florian Engelhardt <florian.engelhardt@datadoghq.com>
Date:   Fri Jan 27 17:19:30 2023 +0100

    fix link to contribution guide (#102)

commit 1c445fe
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Fri Jan 27 09:02:54 2023 -0700

    feat(build-profiling-ffi.sh): support CARGO_TARGET_DIR (#101)

commit 88899ba
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Fri Jan 27 08:43:11 2023 -0700

    fix: clippy lints from Rust v1.67.0 release (#99)

    * fix clippy::uninlined_format_args

    * fix clippy::seek_to_start_instead_of_rewind

commit f2d0ed0
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Wed Jan 25 15:32:33 2023 -0700

    feat(profiling)!: pass errors through more FFI functions (#90)

    This changes the return types for FFI functions:
     - `ddog_prof_Profile_add`
     - `ddog_prof_Exporter_Request_build`

    Adds new structs:
     - `ddog_prof_Profile_AddResult`
     - `ddog_prof_Exporter_Request_BuildResult`

    And adds functions to drop them:
     - `ddog_prof_Profile_AddResult_drop`
     - `ddog_prof_Exporter_Request_BuildResult_drop`

    Removes a now-unnecessary newtype definition of struct
    `Request(exporter::Request)` as cbindgen handles the refactored
    code.

commit c93b7c1
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Mon Jan 23 10:56:21 2023 -0700

    chore: update dependencies (#96)

    * chore: update dependencies

    This fixes a dependabot alert:
    https://github.com/DataDog/libdatadog/security/dependabot/6

    With the updated dependencies, there was a new deprecation:

    > warning: use of deprecated associated function
    > `chrono::TimeZone::timestamp`: use `timestamp_opt()` instead

    I replaced it with `timestamp_opt` and unwrapped it, which is exactly
    what the now-deprecated `timestamp` function does:
    https://github.com/chronotope/chrono/blob/378efb157d674c01761f538d4450705c2b1766a4/src/offset/mod.rs#L343

    They deprecated it because they are working to not panic internally.

    * Bump LICENSE-3rdparty.yml
Merge branch 'main' into origin/david.lee/agentless-use-trace-normalizer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants