Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rerun::Collection borrows data too eagerly, making it very easy to cause segfaults & read of invalid data #7081

Open
facontidavide opened this issue Aug 6, 2024 · 6 comments
Labels
😤 annoying Something in the UI / SDK is annoying to use 🪳 bug Something isn't working 🌊 C++ API C/C++ API specific 💣 crash crash, deadlock/freeze, do-no-start enhancement New feature or request

Comments

@facontidavide
Copy link

facontidavide commented Aug 6, 2024

Describe the bug

I have a class like this:

rerun::Boxes2D createBoxes(....) {
  std::vector<rerun::Position2D> centers;
  std::vector<rerun::datatypes::Vec2D> sizes;
  // fill the vectors
  return rerun::Boxes2D::from_centers_and_sizes(centers, sizes);
}

Unfortunately, the result seems to be corrupted, ore specifically the centers (half-sizes, seems to be ok).
I believe that the problem is that the Collection is borrowing the reference to the centers, instead of taking owership.

The following code fixes the issue:

rerun::Boxes2D createBoxes(....) {
  std::vector<rerun::Position2D> centers;
  std::vector<rerun::datatypes::Vec2D> sizes;
  // fill the vectors
  auto boxes rerun::Boxes2D::from_sizes(sizes);
  boxes.centers = rerun::Collection<rerun::Position2D>::take_ownership(std::move(centers));
  return boxes;
}

To Reproduce
Try the function mentioned above and then print the centers:

auto boxes = risc::createBoxes(....);
for (const auto& center : *(boxes.centers)) {
  std::cout << center.x() << " / " << center.y() << "\n";
}

Rerun version
0.17.0

@facontidavide facontidavide added 👀 needs triage This issue needs to be triaged by the Rerun team 🪳 bug Something isn't working labels Aug 6, 2024
@emilk emilk added 🌊 C++ API C/C++ API specific and removed 👀 needs triage This issue needs to be triaged by the Rerun team labels Aug 6, 2024
@Wumpf
Copy link
Member

Wumpf commented Aug 6, 2024

This is pretty much by design and document as such (although maybe not easy to find :/).
rerun::Collection tries to borrow whenever possible to avoid copies. The way this works is that everything that isn't created from a temporary (r value reference) will go down the borrow path of the collection.

That said, I'm starting to regret this design. It's great to have the option of borrowing data, but we do it too aggressively. This isn't Rust where we can do this kind of operation safely, instead this makes everything very easy to fail as you've experienced.
I think we should change rerun::Collection to borrow only on explicit codepath. This will be a breaking change for the container adapter system which right now is based on the premise of behaving like this with arbitrary types - instead we need to restyle this to have adapters that interact with explicit borrow methods.

Somewhat related to this recent change here where I made it easier to do explicit borrow & ownership taking.

@Wumpf Wumpf changed the title Container doesn't take ownership as expected (borrows from temporary objects) rerun::Collection borrows data too eagerly, making it very easy to cause segfaults & read of invalid data Aug 6, 2024
@Wumpf Wumpf added enhancement New feature or request 😤 annoying Something in the UI / SDK is annoying to use labels Aug 6, 2024
@facontidavide
Copy link
Author

facontidavide commented Aug 7, 2024

Yes, I can understand that you are trying to import good ideas from Rust, but without the compiler support, you CANNOT do that.

My suggestion is to make the C++ API more idiomatic, using a more idiomatic C++ API.

I suggest adding these constructors

// zero-cost move semantic. Probably what I would propose by default in the tutorials
Collection(std::vector<T>&& vect)

// Reference to extenal object. Everyone understands that Collection is NOT taking ownership of this object
// and that the caller is responsible for its life-time.
// Looks unsafe, AS IT SHOULD
Collection(const std::vector<T>* vect)

// The only safe option is to make a copy. No borrowing
Collection(const std::vector<T>& vect);

@Wumpf
Copy link
Member

Wumpf commented Aug 7, 2024

I'm not entirely sold on the idea that passing a pointer signifies borrowing data of the underlying vector well enough. Since rerun collection takes internally a pointer to vect.data() this still suffers from iterator-invalidation (i.e. resize of the vector break the collection). I'd rather make it all the way explicit.
Passing a pointer to the raw data itself == borrow makes more sense for me though as there's then no obvious way of deallocating the vector. I.e. I'm on board with promoting this method overload to a constructor

Also note that these two

Collection(std::vector<T>&& vect);
Collection(const std::vector<T>& vect);

are more or less redundant to each other an can be replaced with

Collection(std::vector<T> vect);

which then internally always moves.
Rationale:

  • If an r-value reference is passed the vector is moved twice (considered cheap): once to create an std::vector (which has a move constructor ofc) and once to move it inside of the Collection
  • If a reference/l-value is passed the vector is copied (unavoidable) and then moved (cheap).

We use that pattern in a lot of places already to cull down on the number of constructors/builder methods.

@Wumpf
Copy link
Member

Wumpf commented Aug 7, 2024

For posterity and in defense of how rerun::Collection eagerly borrows: The archetype types were not meant as a storage object in any of the SDKs and should ideally only ever exist as a temporary in a log call. Then the entire chain works out since user data gets adapted on the fly to a collection which then is temporarily stored in the archetype which then is logged/serialized.
Practically speaking though that's .. often impractical 😅 beyond simple samples.

The idea of "adapt data on the fly without copy" is the reason rerun::Collection came to be in the first place

@facontidavide
Copy link
Author

facontidavide commented Aug 7, 2024

Practically speaking though that's .. often impractical 😅 beyond simple samples.

Yes, this is why I started creating this helper functions to make the code nicer 😉

You are correct about the fact that Collection(std::vector<T> vect) may kill two birds with a stone.

The constructor

explicit Collection(const std::vector<T>* vect);

Are intentionally "ugly" because most C++ developer will look at that and a big red flag will light in their brain.

That is specifically what I want 😄

@emilk
Copy link
Member

emilk commented Aug 8, 2024

The "pointer-means-borrowing is a neat idea, but it leaves the problem of a user now being able to pass in NULL/nullptr, but I guess we can just define that as meaning "empty collection".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
😤 annoying Something in the UI / SDK is annoying to use 🪳 bug Something isn't working 🌊 C++ API C/C++ API specific 💣 crash crash, deadlock/freeze, do-no-start enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants