Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(profiling)!: create FFI Error type and remove *Result_drop methods #95

Merged
merged 10 commits into from
Jan 27, 2023

Conversation

morrisonlevi
Copy link
Contributor

@morrisonlevi morrisonlevi commented Jan 21, 2023

What does this PR do?

This extracts an FFI Error type aka ddog_Error and it contains an FFI Vec internally.

/// Please treat this as opaque; do not reach into it, and especially don't
/// write into it!
#[derive(Debug)]
#[repr(C)]
pub struct Error {
    /// This is a String stuffed into the vec.
    message: Vec<u8>,
}

Then, the FFI error types use this in their Err field, instead of ddcommon_ffi::Vec<u8>:

#[repr(C)]
pub enum SendResult {
    HttpResponse(HttpStatus),
    Err(Error),
}

The *Result_drop methods have been removed:

  • ddog_prof_Exporter_NewResult_drop
  • ddog_prof_Exporter_Request_BuildResult_drop
  • ddog_prof_Exporter_SendResult_drop
  • ddog_prof_Profile_AddResult_drop
  • ddog_prof_Profile_SerializeResult_drop
  • ddog_Vec_Tag_PushResult_drop

And these were added instead:

  • ddog_Error_drop (+ ddog_Error_message to get the message)
  • ddog_prof_EncodedProfile_drop
  • ddog_prof_Exporter_Request_drop

Motivation

I thought it was a bit nicer this way to not have to reach into the guts, and instead have an API.

Additionally, we realized that with the *Result_drop APIs it was a bit easy to get into double-free scenarios, so we removed them.

Additional Notes

I wish we could also extract a generic Result type, but the cbindgen output leaves a bad API. Maybe once some PRs land. The idea would be to hard-code the error type to always be the FFI Error:

#[repr(C)]
enum Result<T> {
    Ok(T),
    Err(Error), // the Error type being the one introduced here
};

How to test the change?

Use the new ddog_Error_message and ddog_Error_drop APIs:

/**
 * # Safety
 * Only pass null or a valid reference to an Error.
 */
void ddog_Error_drop(struct ddog_Error *error);

/**
 * Returns a CharSlice of the error's message that is valid until the error
 * is dropped.
 * # Safety
 * Only pass null or a valid reference to an Error.
 */
ddog_CharSlice ddog_Error_message(const struct ddog_Error *error);

As well as adjust your dropping as the *Result_drop APIs have been removed. You need to drop each case, if necessary (don't need to drop a uint64_t, obviously).

@morrisonlevi morrisonlevi changed the title refactor(profiling): create FFI Error type refactor(profiling)!: create FFI Error type Jan 21, 2023
@ivoanjo
Copy link
Member

ivoanjo commented Jan 24, 2023

So... I'm a bit unsure about this one.

Things I like:

  • The idea of having a single "object" for containing the error, because that allows us to easily feed more information there (maybe some day the libdatadog stack trace? more details?) without having to change the API again.

  • The FFI code seems cleaner and easier to follow, since it can focus on "this is an error", and the string handling stuff is only in a single place.

Things I'm not quite sure:

  • This doesn't make it easier for the client to print these strings. E.g. consider that I turn every error into a Ruby string:

    VALUE err_details = ruby_string_from_prof_vec_u8(exporter_result.err);

    so I'm imagining I'd update it to:

    VALUE err_details = ruby_string_from_error(exporter_result.err);

    (where ruby_string_from_error calls ddog_Error_message.)

    So this doesn't particularly help out this part...

  • One apparent benefit of this change would be to have a generic drop for every error. Which is true but... I don't think it alleviates the need for any of the other _drop methods, since there are still going to be cases where the clients need to drop the non-faulty result.

    Thus the API contract goes from "you need to either consume or drop most objects" to "you need to either consume or drop most objects, but if it's an error you can drop the error message instead", e.g. there's now two ways of doing the same thing.

    This is not necessarily a blocker; I'm thinking we can still keep the error, and not add the error-specific drop, which would prevent this issue.

  • (Minor: The change to prefix arguments with _, while it makes sense from the Rust side, it's propagated to the FFI headers and its meaning there is kinda confusing. I realize that we were using a mix of _ and no _, but I actually think the right thing in the FFI is to never use _)

@morrisonlevi
Copy link
Contributor Author

Thus the API contract goes from "you need to either consume or drop most objects" to "you need to either consume or drop most objects, but if it's an error you can drop the error message instead", e.g. there's now two ways of doing the same thing.

What got me started on this path was actually a case where dropping the result was likely to result in double-free. So for that type, I specifically would not create a drop for the result type, and force callers to drop the error case. I'm unsure if that ought to generalize or not. Maybe we should do a case-by-case review?

Base automatically changed from levi/ffi-errors to main January 25, 2023 22:32
This extracts an FFI Error type `ddog_Error`. It contains an FFI Vec
internally:

```rust
/// Please treat this as opaque; do not reach into it, and especially
/// don't write into it!
pub struct Error {
    /// This is a String stuffed into the vec.
    message: Vec<u8>,
}
```

FFI result types have been updated e.g.:

```rust
pub enum SendResult {
    HttpResponse(HttpStatus),
    Err(Error),
}
```

Instead of reaching into the buffer, use these APIs:

```c
/**
 * # Safety
 * Only pass null or a valid reference to an Error.
 */
void ddog_Error_drop(struct ddog_Error *error);

/**
 * Returns a CharSlice of the error's message that is valid until the error
 * is dropped.
 * # Safety
 * Only pass null or a valid reference to an Error.
 */
ddog_CharSlice ddog_Error_message(const struct ddog_Error *error);
```
I think API usage is more likely to be correct when each case is
dropped as needed. This shows the diff for our examples.
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly, the guiding principle of the latest version is: there is never any _drop for a FooResult, there's only _drop (if needed) for the contents of FooResult -- there's one for the error, and another (if needed) for the non-error return.

I think I like it. There's only one correct way of doing things, and it de-emphasizes the FooResults from being a thing you should care about vs just being effectively a shortcut for returning a (success flag, something) from our functions that would not exist if C allowed multiple returns natively.

One thing that occurred to me is -- should we add a note somewhere that when adding functions that return results via the FFI, they should always be marked as #[must_use]? That way clients get a compiler complaint if they just use a function that returns a result in a fire-and-forget-way (e.g. they're probably missing a drop).

I checked and every function right now is correctly tagged, but it may be useful to document this somewhere so we don't forget this key fact :)

profiling-ffi/src/profiles.rs Outdated Show resolved Hide resolved
Comment on lines +69 to +70
// Leave an empty String in place to help with double-free issues.
let mut tmp = Error::from(String::new());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an empty string a ddog_Vec_U8 with ptr set to NULL?

Copy link
Contributor Author

@morrisonlevi morrisonlevi Jan 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not null, but doesn't use an allocation. If you follow the implementation through, you eventually see this:

    pub const fn new_in(alloc: A) -> Self {
        // `cap: 0` means "unallocated". zero-sized types are ignored.
        Self { ptr: Unique::dangling(), cap: 0, alloc }
    }

And Unique::dangling() uses a NonNull::dangling(), which has documentation:

Creates a new NonNull that is dangling, but well-aligned.
This is useful for initializing types which lazily allocate, like Vec::new does.

So, it's a non-null value that's not backed by an allocation, aka dangling. So, after the _drop() we no longer need to free anything, but if someone accidentally uses ddog_Error_drop or ddog_Error_message then it will still be memory-safe and not have double-free nor user-after-free issues. It shouldn't be relied on by the API consumer, though (fix your code, please!).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never read this, so nothing to fix! But I was curious what ended up there :)

Comment on lines +18 to 21
void print_error(const char *s, const ddog_Error &err) {
auto charslice = ddog_Error_message(&err);
printf("%s (%.*s)\n", s, static_cast<int>(charslice.len), charslice.ptr);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Since the error string is always going to be printed and accessed from the ffi client side, I wonder if it'd be nice to just make it a char * to save the effort of every API client having to turn it into a char * one way or another :)

Copy link
Contributor Author

@morrisonlevi morrisonlevi Jan 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a ddog_CharSlice, which does use char* for it's pointer. We can append a null byte for convenience if you want (as in debugger convenience), but that doesn't guarantee that there aren't any interior null bytes. Meaning, you need the length anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, what I was thinking is -- for error messages specifically libdatadog could ensure that it output a real bona-fide char * with only a null byte at the end so that clients could print it directly.

@morrisonlevi morrisonlevi marked this pull request as ready for review January 26, 2023 16:08
@morrisonlevi morrisonlevi requested a review from a team as a code owner January 26, 2023 16:08
@morrisonlevi morrisonlevi requested a review from a team as a code owner January 26, 2023 17:08
@morrisonlevi
Copy link
Contributor Author

I checked and every function right now is correctly tagged, but it may be useful to document this somewhere so we don't forget this key fact :)

We can document it, yes, but will the programmer see/remember the documentation? :)

I added a README.md file to profiling-ffi.

@morrisonlevi
Copy link
Contributor Author

@ivoanjo, can you double-check before I commit? I did a leak check and it turns out on main we are leaking in the exporter.cpp example (I think it's a missing ddog_prof_Profile_SerializeResult_drop on the happy path). This branch doesn't leak anything, though.

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last check with some last-minute suggested fixes!

Otherwise I've looked at the generated C headers and they also look good.

One last thing that may be interesting to mention in the PR description is that the following were dropped (pun not intended):

  • ddog_prof_Exporter_NewResult_drop
  • ddog_prof_Exporter_Request_BuildResult_drop
  • ddog_prof_Exporter_SendResult_drop
  • ddog_prof_Profile_AddResult_drop
  • ddog_prof_Profile_SerializeResult_drop
  • ddog_Vec_Tag_PushResult_drop

And these were added instead:

  • ddog_Error_drop (+ ddog_Error_message to get the message)
  • ddog_prof_EncodedProfile_drop
  • ddog_prof_Exporter_Request_drop

/// If successful, builds a `ddog_prof_Exporter_Request` object based on the profile data supplied.
/// If unsuccessful, it returns an error message.
///
/// The main use of the `ddog_prof_Exporter_Request` object is to be used in
/// `ddog_prof_Exporter_send`, which will take ownership of the object. Do not pass it to
/// `ddog_prof_Exporter_Request_drop` in such cases, nor call
/// `ddog_prof_Exporter_Request_BuildResult_drop` on the result!
/// `ddog_prof_Exporter_Request_drop` after that!
///
/// # Safety
/// The `exporter` and the files inside of the `files` slice need to have been
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Just noticed that this is actually incorrect -- the files can come from elsewhere e.g. for metrics.json or code-provenance.json :)

Comment on lines -187 to -191
/// Drops the result. Since `ddog_prof_Exporter_send` will take ownership of
/// the Request object, do not call this method if you call
/// `ddog_prof_Exporter_send` on the `ok` value!
#[no_mangle]
pub extern "C" fn ddog_prof_Exporter_Request_BuildResult_drop(_result: RequestBuildResult) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that this one was correctly removed BUT we're missing ddog_prof_Exporter_Request_drop. We even reference it in some comments, but it's nowhere to be found :)

I actually use it in the Ruby profiler mostly because the files in the request are pointing to memory that is managed by Ruby so I do this dance of:

  • Grab VM lock
  • Build request (Ruby is not mutating the files because I'm holding the lock)
  • Release lock
  • Call send

Between "Build request" and "Call send" Ruby may want to abort (e.g. let's imagine the VM really wants to shut down) and in that situation it's when I used this API:

  if (pending_exception) {
    // If we got here send did not run, so we need to explicitly dispose of the request
    ddog_prof_Exporter_Request_drop(request);

    // Let Ruby propagate the exception. This will not return.
    rb_jump_tag(pending_exception);
  }

/// * `request` - Takes ownership of the request. Do not call `ddog_prof_Request_drop` nor
/// `ddog_prof_Exporter_SendResult_drop` on `request` after calling
/// `ddog_prof_Exporter_send` on it.
/// * `request` - Takes ownership of the request. Do not call `ddog_prof_Request_drop` on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Fixing namespace :)

Suggested change
/// * `request` - Takes ownership of the request. Do not call `ddog_prof_Request_drop` on
/// * `request` - Takes ownership of the request. Do not call `ddog_prof_Exporter_Request_drop` on

@morrisonlevi morrisonlevi changed the title refactor(profiling)!: create FFI Error type refactor(profiling)!: create FFI Error type and adjust Result* dropping Jan 27, 2023
@morrisonlevi morrisonlevi changed the title refactor(profiling)!: create FFI Error type and adjust Result* dropping refactor(profiling)!: create FFI Error type and remove *Result_drop methods Jan 27, 2023
@morrisonlevi morrisonlevi merged commit 19b7f69 into main Jan 27, 2023
@morrisonlevi morrisonlevi deleted the levi/error branch January 27, 2023 17:34
thedavl added a commit that referenced this pull request Feb 14, 2023
commit 72af2a0
Author: David Lee <thedavl2001@gmail.com>
Date:   Mon Feb 13 15:44:57 2023 -0800

    Trace Normalizer: add trace & trace chunk normalization (#109)

commit 510472d
Author: Pawel Chojnacki <pawelchcki@gmail.com>
Date:   Thu Feb 9 15:05:33 2023 +0100

    Fixup build-telemetry-ffi and cbindgen.toml (#105)

commit 711bd43
Author: David Lee <thedavl2001@gmail.com>
Date:   Wed Feb 8 15:18:24 2023 -0800

    Trace Normalizer: add service + env tag normalization (#106)

    * add service tag + env tag normalization + unit tests

commit ee4bba8
Author: Ivo Anjo <ivo.anjo@datadoghq.com>
Date:   Tue Feb 7 15:38:13 2023 +0000

    Remove unneeded "gem signout" step from Ruby release (#108)

    **What does this PR do?**:

    This PR removes the now-unneeded "gem signout" steps during the Ruby
    release process.

    **Motivation**:

    In #85, we changed the way that we authenticate with rubygems.org when
    pushing a new libdatadog release.

    I just did a release with this new code, and noticed that because
    we no longer log in, but just use a limited API key, the "gem signout"
    does not do anything and emits an error.

    Here's what I saw when I ran `docker-compose run push_to_rubygems`:

    ```
    ... preparation of packages goes here...

    ERROR:  You are not currently signed in.
    Please input 'libdatadog ruby release key' from 'Profiling - Falcon' Datadog 1Password:
    (...key...)
    Pushing gem to https://rubygems.org...
    You have enabled multi-factor authentication. Please enter OTP code.
    Code:   (...)
    Successfully registered gem: libdatadog (2.0.0.1.0)
    Pushing gem to https://rubygems.org...
    You have enabled multi-factor authentication. Please enter OTP code.
    Code:   (...)
    Successfully registered gem: libdatadog (2.0.0.1.0-x86_64-linux)
    Pushing gem to https://rubygems.org...
    You have enabled multi-factor authentication. Please enter OTP code.
    Code:   (...)
    Successfully registered gem: libdatadog (2.0.0.1.0-aarch64-linux)
    ERROR:  You are not currently signed in.
    ```

    Those two "ERROR:  You are not currently signed in" come from the
    "gem signout" steps, and what's why I'm removing them.

    **Additional Notes**:

    (N/A)

    **How to test the change?**:

    You can run `docker-compose run push_to_rubygems` and validate
    the errors will not show up again. This is safe because Rubygems
    does not allow re-releasing the same packages.

commit 8be5465
Author: Ivo Anjo <ivo.anjo@datadoghq.com>
Date:   Fri Feb 3 14:35:58 2023 +0000

    Package libdatadog v2.0.0 for Ruby (#107)

    **What does this PR do?**:

    This PR includes the changes documented in the
    "Releasing a new version to rubygems.org" part of the README:
    <https://github.com/DataDog/libdatadog/tree/main/ruby#releasing-a-new-version-to-rubygemsorg>

    **Motivation**:

    Enable Ruby to use libdatadog v2.0.0.

    **Additional Notes**:

    (N/A)

    **How to test the change?**:

    This was locally checked with the Ruby profiler branch that already
    supports libdatadog 2.

commit 73b8e2e
Author: David Lee <thedavl2001@gmail.com>
Date:   Thu Feb 2 14:01:07 2023 -0800

    Create a span normalizer skeleton, fully implement span name normalization (#100)

commit 097ea5d
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Thu Feb 2 10:32:39 2023 -0700

    refactor(profiling)!: less chance of request double free (#103)

commit e3887c3
Author: Pawel Chojnacki <pawelchcki@gmail.com>
Date:   Mon Jan 30 18:06:46 2023 +0100

    Fix CI warnings (#104)

    * Fix warnings

    * clippy fix

commit 19b7f69
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Fri Jan 27 10:34:37 2023 -0700

    refactor(profiling)!: create FFI Error type and remove `*Result_drop` methods (#95)

    * refactor(profiling)!: create FFI Error type

    This extracts an FFI Error type `ddog_Error`. It contains an FFI Vec
    internally:

    ```rust
    /// Please treat this as opaque; do not reach into it, and especially
    /// don't write into it!
    pub struct Error {
        /// This is a String stuffed into the vec.
        message: Vec<u8>,
    }
    ```

    FFI result types have been updated e.g.:

    ```rust
    pub enum SendResult {
        HttpResponse(HttpStatus),
        Err(Error),
    }
    ```

    Instead of reaching into the buffer, use these APIs:

    ```c
    /**
     * # Safety
     * Only pass null or a valid reference to an Error.
     */
    void ddog_Error_drop(struct ddog_Error *error);

    /**
     * Returns a CharSlice of the error's message that is valid until the error
     * is dropped.
     * # Safety
     * Only pass null or a valid reference to an Error.
     */
    ddog_CharSlice ddog_Error_message(const struct ddog_Error *error);
    ```

    * Drop *Result_drop functions

    The `*Result_drop` methods have been removed:

    - ddog_prof_Exporter_NewResult_drop
    - ddog_prof_Exporter_Request_BuildResult_drop
    - ddog_prof_Exporter_SendResult_drop
    - ddog_prof_Profile_AddResult_drop
    - ddog_prof_Profile_SerializeResult_drop
    - ddog_Vec_Tag_PushResult_drop

    And these were added instead:

    - ddog_Error_drop (+ ddog_Error_message to get the message)
    - ddog_prof_EncodedProfile_drop
    - ddog_prof_Exporter_Request_drop
    * Make a note about #[must_use]

    * Add ddog_prof_EncodedProfile_drop

    Co-authored-by: Ivo Anjo <ivo.anjo@datadoghq.com>

commit ed1ee92
Author: Florian Engelhardt <florian.engelhardt@datadoghq.com>
Date:   Fri Jan 27 17:19:30 2023 +0100

    fix link to contribution guide (#102)

commit 1c445fe
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Fri Jan 27 09:02:54 2023 -0700

    feat(build-profiling-ffi.sh): support CARGO_TARGET_DIR (#101)

commit 88899ba
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Fri Jan 27 08:43:11 2023 -0700

    fix: clippy lints from Rust v1.67.0 release (#99)

    * fix clippy::uninlined_format_args

    * fix clippy::seek_to_start_instead_of_rewind

commit f2d0ed0
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Wed Jan 25 15:32:33 2023 -0700

    feat(profiling)!: pass errors through more FFI functions (#90)

    This changes the return types for FFI functions:
     - `ddog_prof_Profile_add`
     - `ddog_prof_Exporter_Request_build`

    Adds new structs:
     - `ddog_prof_Profile_AddResult`
     - `ddog_prof_Exporter_Request_BuildResult`

    And adds functions to drop them:
     - `ddog_prof_Profile_AddResult_drop`
     - `ddog_prof_Exporter_Request_BuildResult_drop`

    Removes a now-unnecessary newtype definition of struct
    `Request(exporter::Request)` as cbindgen handles the refactored
    code.

commit c93b7c1
Author: Levi Morrison <levi.morrison@datadoghq.com>
Date:   Mon Jan 23 10:56:21 2023 -0700

    chore: update dependencies (#96)

    * chore: update dependencies

    This fixes a dependabot alert:
    https://github.com/DataDog/libdatadog/security/dependabot/6

    With the updated dependencies, there was a new deprecation:

    > warning: use of deprecated associated function
    > `chrono::TimeZone::timestamp`: use `timestamp_opt()` instead

    I replaced it with `timestamp_opt` and unwrapped it, which is exactly
    what the now-deprecated `timestamp` function does:
    https://github.com/chronotope/chrono/blob/378efb157d674c01761f538d4450705c2b1766a4/src/offset/mod.rs#L343

    They deprecated it because they are working to not panic internally.

    * Bump LICENSE-3rdparty.yml
Merge branch 'main' into origin/david.lee/agentless-use-trace-normalizer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants