Fix OneToManyRelations and VectorMembers in cloned objects #583

jmcarcell · 2024-04-15T08:14:18Z

The context is the following. Let's say we have a collection and each member has a OneToManyRelation to another type, for this example something that looks like a string. Let's say the first element in the collection has a relation to 'a', then the second to 'b' and so on. Each 'a', 'b', 'c', ... are in a different vector at this point. When saving the collection, all of 'a', 'b', 'c', ... go into the same vector, and there are some indexes to know which part of the vector corresponds to each object in the collection (for example for the first element it would be start=0 and end=1. When reading, each object still points to this big vector ['a', 'b', 'c', ...] so if the object is cloned the clone will also point to this big vector. If I clone the first object and then do a addToRelation('z'), the big vector will have one more element ['a', 'b', 'c', ..., 'z'] and the indexes will be updated so that now the relations of the first element will be 'a' and 'b' (start=0 and end=2), when the expected result is 'a' and 'z'.

BEGINRELEASENOTES

Fix OneToManyRelations and VectorMembers in cloned objects. Currently, pushing back these fields to a cloned object does not give the expected result, because the objects that we get after reading are not the same as we had before writing.
Add some code testing this behavior: pushing back to cloned objects that have been read from a file and also after cloning the cloned object.

ENDRELEASENOTES

Initially I implemented it (check if this is the case and if it is then make a new vector) for the clone() method but in this commit I have done it for the addX methods, since it's closer to on demand and won't make all the clone() a bit slower.

tmadlener

Thanks for catching and fixing this. This seems to mainly fix things for cloning things that are read from file. I suppose it also works for things that have just been created in memory.

Is there an easy way to put the new tests into the Catch2 unittest harness?

python/templates/macros/implementations.jinja2

tmadlener · 2024-04-15T09:00:14Z

python/templates/macros/implementations.jinja2

@@ -99,6 +100,11 @@ void {{ class_type }}::{{ relation.setter_name(get_syntax) }}({{ relation.full_t
 {% for relation in relations %}
 {% if with_adder %}
 void {{ class_type }}::{{ relation.setter_name(get_syntax, is_relation=True) }}({{ relation.full_type }} component) {
+  if (m_obj->data.{{ relation.name }}_end != m_obj->m_{{ relation.name }}->size()) {


Can you add a comment here that says why we are doing this for the future? Should we just make this copy in the clone call directly without waiting for it to trigger here? The effect would be the same, but I think the code would be easier to follow.

This is up to us, if it's done for every clone() call (which is implemented in this PR before 3323f23) then the object after cloning is correct and we don't have to worry anymore but then we are making more copies with every clone(), and they are not always needed if we are not modifying the vector members or one-to-many relations. If it's done in the addX then the copies will only happen a fraction of the times you clone(). For the latter, I still have to check if saving cloned objects that come from reading is OK or not.

I think in this case I would go for the implementation in clone because it's easier to see why it happens there. We can always come back and move it to a slightly more unintuitive place once we see performance is actually an issue. In the end for most relations we are copying a vector of pointers and most vector members are also pretty small objects.

I had a look at saving cloned objects and that's worse when it's implemented in the addX because after cloning each element will point to the big vector and for each element saved a big vector will also be saved (even though when reading since we have the indexes it's fine). (For example, if element 0 has a relation to 'a', 1 to 'b', 2 to 'c' and 3 to 'd', if 0 is cloned and saved then ['a', 'b', 'c', 'd'] is saved but if 0 and 1 are both cloned then ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd'] is saved. So certainly having it in addX with the same behaviour as in clone() is more complicated

Yeah, I had this though over lunch as well. I think clone would have to copy contents to a different vector in order to make the writing side work again as well. I am currently having a look to remember which one should actually be used for this.

OK. I think the additional vector I was thinking about doesn't really play a role here, it's in CollectionData and should be handled correctly if we make sure to handle the vector correctly here.

In that case I agree with you and adding things in clone is the easier way, because otherwise we have to keep track of whether addX has been called and then do the copy in any case before writing in order for the rest to work as expected.

In that case I agree with you and adding things in clone is the easier way, because otherwise we have to keep track of whether addX has been called and then do the copy in any case before writing in order for the rest to work as expected.

I think no tracking needs to be done here. The workflow would be to have this fix in addX, then when writing we can have objects that are pointing to a big vector but we may not want to write it all and objects that are pointing to a "small" vector so we have to use the X_begin and X_end indexes to create a new big vector to only write what the objects are pointing to. Probably starting here: https://github.com/AIDASoft/podio/blob/master/python/templates/CollectionData.cc.jinja2#L134

jmcarcell · 2024-04-15T13:29:25Z

I changed the fix to be in clone() and moved the tests in a unittest (maybe I should add an rntuple one...)

I suppose it also works for things that have just been created in memory.

For memory only everything works fine because there is never this conversion to a single vector, that only happens when writing

tmadlener

This looks good to me and I am fairly certain everything works as expected also in the case where the relations are empty (when read from file). Is it easily possible to have a test for that as well?

tmadlener

Thanks :)

tmadlener · 2024-04-15T19:54:56Z

python/templates/macros/implementations.jinja2

+  // If the current object has been read from a file, then the object may only have a slice of the relation vector
+  // so this slice has to be copied in case we want to modify it
+  if (m_obj->data.{{ relation.name }}_end - m_obj->data.{{ relation.name }}_begin != m_obj->m_{{ relation.name }}->size()) {
+    tmp.m_obj->m_{{ relation.name }} = new std::vector<{{ relation.full_type }}>(m_obj->m_{{ relation.name }}->begin() + m_obj->data.{{ relation.name }}_begin, m_obj->m_{{ relation.name }}->begin() + m_obj->data.{{ relation.name }}_end);


Are we leaking the vector that is created in the tmp constructor here by not getting rid of it explicitly? I think we are and the AddressSanitizer should in principle catch this in the unittests, but we would need to go through an I/O backend that has no memory issues. SIO and RNTuple should both work for that. Then we could remove the ASAN-FAIL (and hopefully also UBSAN-FAIL labels in the tests below.

Here I would say that no, or at least that this is the same situation as when a new object is created since then also the vector is created with new std::vector. When cloning a collection that is read, the initial big vector is still managed by the read collection and its objects and the new vector by the cloned object.
Sanitizers seem to be happy. Locally (with sanitizers) all the rntuple tests fail for me with either gcc 13 or clang 17 and ROOT 6.30.06

Makes sense. I think the cleanup is correct and since we run it through the sanitizers now this should be OK. I will check again locally to make sure that the tests are actually run.

OK. Turns out the cleanup is not OK, and we actually leak the pointer from tmp. Looking at the code what happens is the following:

In the constructor for tmp we create a full copy of the big vector (that is managed by the collection)

podio/python/templates/Obj.cc.jinja2

Lines 32 to 37 in 43330a8

{{ obj_type }}::{{ obj_type }}(const {{ obj_type }}& other) :

id(),

data(other.data){{ single_relations_initialize(OneToOneRelations) }}

{%- for relation in OneToManyRelations + VectorMembers %},

m_{{ relation.name }}(new std::vector<{{ relation.full_type }}>(*(other.m_{{ relation.name }})))

{%- endfor %}

In order to only get the slice that we want we create another vector with new (this snippet), but do not get rid of the copy we made in the first place

So we need to delete the original vector in tmp before we put in the new copy of the slice.

As far as I can tell from a quick look the Obj(Obj const&) copy constructor is only called via clone, so we could actually be a bit more clever with the unnecessary copying and only do the necessary work in clone.

tmadlener · 2024-04-19T10:46:52Z

Can you rebase this to pick the changes from master?

Fix an issue where it's not possible to push_back to a cloned object that has been read because all the OneToManyRelations in the same collection point to the same vector. The fix is to actually make a copy of the part of the vector that belongs to that object.

jmcarcell · 2024-04-19T13:05:09Z

After #588 and #589 I can easily test this and now the implementation doesn't have any leaks and does the minimum possible work; only copying the elements that are being pointed to. I was going to exclude the RNTuple test from the tests with sanitizers since it fails for me, but there are others that do so they will have to be changed anyway, I think the same will be seen when ROOT 6.30.06 is available.

tmadlener

I just have a few minor comments / nitpicks.

The one thing that I don't see tested is whether the cloned objects can be written properly, but since they should now be properly initialized / constructed, they should behave just like any other freshly created object, so things should just work (TM).

tmadlener · 2024-04-19T17:04:29Z

python/templates/macros/implementations.jinja2

+  // so this slice has to be copied in case we want to modify it
+  tmp->m_{{ relation.name }}->reserve(m_obj->m_{{ relation.name }}->size());
+  for (size_t i = m_obj->data.{{ relation.name }}_begin; i < m_obj->data.{{ relation.name }}_end; i++) {
+    tmp->m_{{ relation.name }}->push_back((*m_obj->m_{{ relation.name }})[i]);


Suggested change

tmp->m_{{ relation.name }}->push_back((*m_obj->m_{{ relation.name }})[i]);

tmp->m_{{ relation.name }}->emplace_back((*m_obj->m_{{ relation.name }})[i]);

Might be able to skip one of the checks in MabyeSharedPtr if we get a move this way.

I'm not sure this can be a move, [] should return something movable no? In any case you don't want to move since you don't want to lose the original relation after a clone()

You are probably right, we won't get a move, but we should save creating a temporary for the push_back, and instead just get one copy constructor call.

tmadlener · 2024-04-19T17:13:08Z

python/templates/macros/implementations.jinja2

+{% if prefix %}
  return Mutable{{ type }}(podio::utils::MaybeSharedPtr(new {{ type }}Obj(*m_obj), podio::utils::MarkOwned));
+{% else %}


Do I understand correctly here that if we clone a Mutable type, we are guaranteed to not have slicing and skip the rest?

I think so, you can only get the pointers to the big vector when reading and then that's a non-mutable object which is now fixed so when you go to Mutable it should be fine. Previously I think it may have been possible by non-mutable (from reading) -> mutable (first clone) -> mutable (second clone) and the second clone would maybe be pointing to the big vector.

python/templates/macros/implementations.jinja2

tests/unittests/unittest.cpp

hegner · 2024-04-22T07:20:30Z

Thanks! Quite important to fix this! If you fix the format check issues, I could merge it today

veprbl · 2024-06-22T18:24:22Z

This caused a regression for us #631

tmadlener reviewed Apr 15, 2024

View reviewed changes

tmadlener approved these changes Apr 15, 2024

View reviewed changes

tmadlener reviewed Apr 15, 2024

View reviewed changes

jmcarcell mentioned this pull request Apr 18, 2024

Fix leak in the buffer vectorMembers when reading SIO frames #589

Merged

jmcarcell added 11 commits April 19, 2024 14:32

Fix also vector members

d2427fc

Clone twice when reading

c90c87b

Implement the fix in the addX methods

364c190

Fix formatting

ba04309

Implement the fix in clone()

51e078c

Add comment and tests to unittests

68ea5d6

Add a test for a collection with empty relations

ce82b4b

Fix format

343601c

Fix check

b5655af

Add test with rntuple

ebe00aa

jmcarcell force-pushed the relations branch from 24425fc to 8637d97 Compare April 19, 2024 12:54

jmcarcell added 3 commits April 19, 2024 14:54

Fix a leak in the implementation

8637d97

Use a for loop

98703c6

Call reserve first

0eec465

Add test case for SIO

d4aa198

tmadlener reviewed Apr 19, 2024

View reviewed changes

jmcarcell added 3 commits April 22, 2024 08:45

Use a constructor and emplace_back

a908612

Add the [basics] label

ac5d4a0

Add test reading cloned objects

68630b3

Fix style

19bf008

tmadlener merged commit 02a4b9d into AIDASoft:master Apr 22, 2024
16 of 18 checks passed

jmcarcell mentioned this pull request May 29, 2024

Add an Overlay algorithm key4hep/k4Reco#2

Merged

jmcarcell mentioned this pull request Jun 23, 2024

Fix one to one relations for cloned objects #632

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix OneToManyRelations and VectorMembers in cloned objects #583

Fix OneToManyRelations and VectorMembers in cloned objects #583

jmcarcell commented Apr 15, 2024 •

edited

Loading

tmadlener left a comment

tmadlener Apr 15, 2024

jmcarcell Apr 15, 2024

tmadlener Apr 15, 2024

jmcarcell Apr 15, 2024

tmadlener Apr 15, 2024

tmadlener Apr 15, 2024

jmcarcell Apr 16, 2024

jmcarcell commented Apr 15, 2024

tmadlener left a comment

tmadlener left a comment

tmadlener Apr 15, 2024

jmcarcell Apr 16, 2024 •

edited

Loading

tmadlener Apr 16, 2024

tmadlener Apr 16, 2024

tmadlener Apr 16, 2024

tmadlener commented Apr 19, 2024

jmcarcell commented Apr 19, 2024

tmadlener left a comment

tmadlener Apr 19, 2024

jmcarcell Apr 22, 2024

tmadlener Apr 22, 2024

tmadlener Apr 19, 2024

jmcarcell Apr 22, 2024

hegner commented Apr 22, 2024

veprbl commented Jun 22, 2024

	{{ obj_type }}::{{ obj_type }}(const {{ obj_type }}& other) :
	id(),
	data(other.data){{ single_relations_initialize(OneToOneRelations) }}
	{%- for relation in OneToManyRelations + VectorMembers %},
	m_{{ relation.name }}(new std::vector<{{ relation.full_type }}>(*(other.m_{{ relation.name }})))
	{%- endfor %}

	tmp->m_{{ relation.name }}->push_back((*m_obj->m_{{ relation.name }})[i]);
	tmp->m_{{ relation.name }}->emplace_back((*m_obj->m_{{ relation.name }})[i]);

Fix OneToManyRelations and VectorMembers in cloned objects #583

Fix OneToManyRelations and VectorMembers in cloned objects #583

Conversation

jmcarcell commented Apr 15, 2024 • edited Loading

tmadlener left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmcarcell commented Apr 15, 2024

tmadlener left a comment

Choose a reason for hiding this comment

tmadlener left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmcarcell Apr 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tmadlener commented Apr 19, 2024

jmcarcell commented Apr 19, 2024

tmadlener left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hegner commented Apr 22, 2024

veprbl commented Jun 22, 2024

jmcarcell commented Apr 15, 2024 •

edited

Loading

jmcarcell Apr 16, 2024 •

edited

Loading