Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OneToManyRelations and VectorMembers in cloned objects #583

Merged
merged 19 commits into from
Apr 22, 2024

Conversation

jmcarcell
Copy link
Member

@jmcarcell jmcarcell commented Apr 15, 2024

The context is the following. Let's say we have a collection and each member has a OneToManyRelation to another type, for this example something that looks like a string. Let's say the first element in the collection has a relation to 'a', then the second to 'b' and so on. Each 'a', 'b', 'c', ... are in a different vector at this point. When saving the collection, all of 'a', 'b', 'c', ... go into the same vector, and there are some indexes to know which part of the vector corresponds to each object in the collection (for example for the first element it would be start=0 and end=1. When reading, each object still points to this big vector ['a', 'b', 'c', ...] so if the object is cloned the clone will also point to this big vector. If I clone the first object and then do a addToRelation('z'), the big vector will have one more element ['a', 'b', 'c', ..., 'z'] and the indexes will be updated so that now the relations of the first element will be 'a' and 'b' (start=0 and end=2), when the expected result is 'a' and 'z'.

BEGINRELEASENOTES

  • Fix OneToManyRelations and VectorMembers in cloned objects. Currently, pushing back these fields to a cloned object does not give the expected result, because the objects that we get after reading are not the same as we had before writing.
  • Add some code testing this behavior: pushing back to cloned objects that have been read from a file and also after cloning the cloned object.

ENDRELEASENOTES

Initially I implemented it (check if this is the case and if it is then make a new vector) for the clone() method but in this commit I have done it for the addX methods, since it's closer to on demand and won't make all the clone() a bit slower.

Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching and fixing this. This seems to mainly fix things for cloning things that are read from file. I suppose it also works for things that have just been created in memory.

Is there an easy way to put the new tests into the Catch2 unittest harness?

python/templates/macros/implementations.jinja2 Outdated Show resolved Hide resolved
@@ -99,6 +100,11 @@ void {{ class_type }}::{{ relation.setter_name(get_syntax) }}({{ relation.full_t
{% for relation in relations %}
{% if with_adder %}
void {{ class_type }}::{{ relation.setter_name(get_syntax, is_relation=True) }}({{ relation.full_type }} component) {
if (m_obj->data.{{ relation.name }}_end != m_obj->m_{{ relation.name }}->size()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here that says why we are doing this for the future? Should we just make this copy in the clone call directly without waiting for it to trigger here? The effect would be the same, but I think the code would be easier to follow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is up to us, if it's done for every clone() call (which is implemented in this PR before 3323f23) then the object after cloning is correct and we don't have to worry anymore but then we are making more copies with every clone(), and they are not always needed if we are not modifying the vector members or one-to-many relations. If it's done in the addX then the copies will only happen a fraction of the times you clone(). For the latter, I still have to check if saving cloned objects that come from reading is OK or not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case I would go for the implementation in clone because it's easier to see why it happens there. We can always come back and move it to a slightly more unintuitive place once we see performance is actually an issue. In the end for most relations we are copying a vector of pointers and most vector members are also pretty small objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at saving cloned objects and that's worse when it's implemented in the addX because after cloning each element will point to the big vector and for each element saved a big vector will also be saved (even though when reading since we have the indexes it's fine). (For example, if element 0 has a relation to 'a', 1 to 'b', 2 to 'c' and 3 to 'd', if 0 is cloned and saved then ['a', 'b', 'c', 'd'] is saved but if 0 and 1 are both cloned then ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd'] is saved. So certainly having it in addX with the same behaviour as in clone() is more complicated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had this though over lunch as well. I think clone would have to copy contents to a different vector in order to make the writing side work again as well. I am currently having a look to remember which one should actually be used for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I think the additional vector I was thinking about doesn't really play a role here, it's in CollectionData and should be handled correctly if we make sure to handle the vector correctly here.

In that case I agree with you and adding things in clone is the easier way, because otherwise we have to keep track of whether addX has been called and then do the copy in any case before writing in order for the rest to work as expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I agree with you and adding things in clone is the easier way, because otherwise we have to keep track of whether addX has been called and then do the copy in any case before writing in order for the rest to work as expected.

I think no tracking needs to be done here. The workflow would be to have this fix in addX, then when writing we can have objects that are pointing to a big vector but we may not want to write it all and objects that are pointing to a "small" vector so we have to use the X_begin and X_end indexes to create a new big vector to only write what the objects are pointing to. Probably starting here: https://github.com/AIDASoft/podio/blob/master/python/templates/CollectionData.cc.jinja2#L134

@jmcarcell
Copy link
Member Author

I changed the fix to be in clone() and moved the tests in a unittest (maybe I should add an rntuple one...)

I suppose it also works for things that have just been created in memory.

For memory only everything works fine because there is never this conversion to a single vector, that only happens when writing

Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me and I am fairly certain everything works as expected also in the case where the relations are empty (when read from file). Is it easily possible to have a test for that as well?

Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :)

// If the current object has been read from a file, then the object may only have a slice of the relation vector
// so this slice has to be copied in case we want to modify it
if (m_obj->data.{{ relation.name }}_end - m_obj->data.{{ relation.name }}_begin != m_obj->m_{{ relation.name }}->size()) {
tmp.m_obj->m_{{ relation.name }} = new std::vector<{{ relation.full_type }}>(m_obj->m_{{ relation.name }}->begin() + m_obj->data.{{ relation.name }}_begin, m_obj->m_{{ relation.name }}->begin() + m_obj->data.{{ relation.name }}_end);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we leaking the vector that is created in the tmp constructor here by not getting rid of it explicitly? I think we are and the AddressSanitizer should in principle catch this in the unittests, but we would need to go through an I/O backend that has no memory issues. SIO and RNTuple should both work for that. Then we could remove the ASAN-FAIL (and hopefully also UBSAN-FAIL labels in the tests below.

Copy link
Member Author

@jmcarcell jmcarcell Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would say that no, or at least that this is the same situation as when a new object is created since then also the vector is created with new std::vector. When cloning a collection that is read, the initial big vector is still managed by the read collection and its objects and the new vector by the cloned object.
Sanitizers seem to be happy. Locally (with sanitizers) all the rntuple tests fail for me with either gcc 13 or clang 17 and ROOT 6.30.06

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I think the cleanup is correct and since we run it through the sanitizers now this should be OK. I will check again locally to make sure that the tests are actually run.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Turns out the cleanup is not OK, and we actually leak the pointer from tmp. Looking at the code what happens is the following:

  • In the constructor for tmp we create a full copy of the big vector (that is managed by the collection)
    {{ obj_type }}::{{ obj_type }}(const {{ obj_type }}& other) :
    id(),
    data(other.data){{ single_relations_initialize(OneToOneRelations) }}
    {%- for relation in OneToManyRelations + VectorMembers %},
    m_{{ relation.name }}(new std::vector<{{ relation.full_type }}>(*(other.m_{{ relation.name }})))
    {%- endfor %}
  • In order to only get the slice that we want we create another vector with new (this snippet), but do not get rid of the copy we made in the first place

So we need to delete the original vector in tmp before we put in the new copy of the slice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell from a quick look the Obj(Obj const&) copy constructor is only called via clone, so we could actually be a bit more clever with the unnecessary copying and only do the necessary work in clone.

@tmadlener
Copy link
Collaborator

Can you rebase this to pick the changes from master?

Fix an issue where it's not possible to push_back to a cloned object that has
been read because all the OneToManyRelations in the same collection
point to the same vector. The fix is to actually make a copy of the
part of the vector that belongs to that object.
@jmcarcell
Copy link
Member Author

After #588 and #589 I can easily test this and now the implementation doesn't have any leaks and does the minimum possible work; only copying the elements that are being pointed to. I was going to exclude the RNTuple test from the tests with sanitizers since it fails for me, but there are others that do so they will have to be changed anyway, I think the same will be seen when ROOT 6.30.06 is available.

Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a few minor comments / nitpicks.

The one thing that I don't see tested is whether the cloned objects can be written properly, but since they should now be properly initialized / constructed, they should behave just like any other freshly created object, so things should just work (TM).

// so this slice has to be copied in case we want to modify it
tmp->m_{{ relation.name }}->reserve(m_obj->m_{{ relation.name }}->size());
for (size_t i = m_obj->data.{{ relation.name }}_begin; i < m_obj->data.{{ relation.name }}_end; i++) {
tmp->m_{{ relation.name }}->push_back((*m_obj->m_{{ relation.name }})[i]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tmp->m_{{ relation.name }}->push_back((*m_obj->m_{{ relation.name }})[i]);
tmp->m_{{ relation.name }}->emplace_back((*m_obj->m_{{ relation.name }})[i]);

Might be able to skip one of the checks in MabyeSharedPtr if we get a move this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this can be a move, [] should return something movable no? In any case you don't want to move since you don't want to lose the original relation after a clone()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are probably right, we won't get a move, but we should save creating a temporary for the push_back, and instead just get one copy constructor call.

Comment on lines +22 to +24
{% if prefix %}
return Mutable{{ type }}(podio::utils::MaybeSharedPtr(new {{ type }}Obj(*m_obj), podio::utils::MarkOwned));
{% else %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly here that if we clone a Mutable type, we are guaranteed to not have slicing and skip the rest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, you can only get the pointers to the big vector when reading and then that's a non-mutable object which is now fixed so when you go to Mutable it should be fine. Previously I think it may have been possible by non-mutable (from reading) -> mutable (first clone) -> mutable (second clone) and the second clone would maybe be pointing to the big vector.

python/templates/macros/implementations.jinja2 Outdated Show resolved Hide resolved
tests/unittests/unittest.cpp Outdated Show resolved Hide resolved
@hegner
Copy link
Collaborator

hegner commented Apr 22, 2024

Thanks! Quite important to fix this! If you fix the format check issues, I could merge it today

@tmadlener tmadlener merged commit 02a4b9d into AIDASoft:master Apr 22, 2024
16 of 18 checks passed
@veprbl
Copy link
Contributor

veprbl commented Jun 22, 2024

This caused a regression for us #631

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants