-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix OneToManyRelations and VectorMembers in cloned objects #583
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching and fixing this. This seems to mainly fix things for cloning things that are read from file. I suppose it also works for things that have just been created in memory.
Is there an easy way to put the new tests into the Catch2 unittest harness?
@@ -99,6 +100,11 @@ void {{ class_type }}::{{ relation.setter_name(get_syntax) }}({{ relation.full_t | |||
{% for relation in relations %} | |||
{% if with_adder %} | |||
void {{ class_type }}::{{ relation.setter_name(get_syntax, is_relation=True) }}({{ relation.full_type }} component) { | |||
if (m_obj->data.{{ relation.name }}_end != m_obj->m_{{ relation.name }}->size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here that says why we are doing this for the future? Should we just make this copy in the clone
call directly without waiting for it to trigger here? The effect would be the same, but I think the code would be easier to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is up to us, if it's done for every clone()
call (which is implemented in this PR before 3323f23) then the object after cloning is correct and we don't have to worry anymore but then we are making more copies with every clone()
, and they are not always needed if we are not modifying the vector members or one-to-many relations. If it's done in the addX
then the copies will only happen a fraction of the times you clone()
. For the latter, I still have to check if saving cloned objects that come from reading is OK or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case I would go for the implementation in clone
because it's easier to see why it happens there. We can always come back and move it to a slightly more unintuitive place once we see performance is actually an issue. In the end for most relations we are copying a vector of pointers and most vector members are also pretty small objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look at saving cloned objects and that's worse when it's implemented in the addX
because after cloning each element will point to the big vector and for each element saved a big vector will also be saved (even though when reading since we have the indexes it's fine). (For example, if element 0 has a relation to 'a', 1 to 'b', 2 to 'c' and 3 to 'd', if 0 is cloned and saved then ['a', 'b', 'c', 'd'] is saved but if 0 and 1 are both cloned then ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd'] is saved. So certainly having it in addX
with the same behaviour as in clone()
is more complicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had this though over lunch as well. I think clone
would have to copy contents to a different vector in order to make the writing side work again as well. I am currently having a look to remember which one should actually be used for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I think the additional vector I was thinking about doesn't really play a role here, it's in CollectionData
and should be handled correctly if we make sure to handle the vector correctly here.
In that case I agree with you and adding things in clone
is the easier way, because otherwise we have to keep track of whether addX
has been called and then do the copy in any case before writing in order for the rest to work as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I agree with you and adding things in
clone
is the easier way, because otherwise we have to keep track of whetheraddX
has been called and then do the copy in any case before writing in order for the rest to work as expected.
I think no tracking needs to be done here. The workflow would be to have this fix in addX
, then when writing we can have objects that are pointing to a big vector but we may not want to write it all and objects that are pointing to a "small" vector so we have to use the X_begin
and X_end
indexes to create a new big vector to only write what the objects are pointing to. Probably starting here: https://github.com/AIDASoft/podio/blob/master/python/templates/CollectionData.cc.jinja2#L134
I changed the fix to be in
For memory only everything works fine because there is never this conversion to a single vector, that only happens when writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me and I am fairly certain everything works as expected also in the case where the relations are empty (when read from file). Is it easily possible to have a test for that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :)
// If the current object has been read from a file, then the object may only have a slice of the relation vector | ||
// so this slice has to be copied in case we want to modify it | ||
if (m_obj->data.{{ relation.name }}_end - m_obj->data.{{ relation.name }}_begin != m_obj->m_{{ relation.name }}->size()) { | ||
tmp.m_obj->m_{{ relation.name }} = new std::vector<{{ relation.full_type }}>(m_obj->m_{{ relation.name }}->begin() + m_obj->data.{{ relation.name }}_begin, m_obj->m_{{ relation.name }}->begin() + m_obj->data.{{ relation.name }}_end); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we leaking the vector
that is created in the tmp
constructor here by not getting rid of it explicitly? I think we are and the AddressSanitizer should in principle catch this in the unittests, but we would need to go through an I/O backend that has no memory issues. SIO and RNTuple should both work for that. Then we could remove the ASAN-FAIL
(and hopefully also UBSAN-FAIL
labels in the tests below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I would say that no, or at least that this is the same situation as when a new object is created since then also the vector is created with new std::vector
. When cloning a collection that is read, the initial big vector is still managed by the read collection and its objects and the new vector by the cloned object.
Sanitizers seem to be happy. Locally (with sanitizers) all the rntuple tests fail for me with either gcc 13 or clang 17 and ROOT 6.30.06
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I think the cleanup is correct and since we run it through the sanitizers now this should be OK. I will check again locally to make sure that the tests are actually run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Turns out the cleanup is not OK, and we actually leak the pointer from tmp
. Looking at the code what happens is the following:
- In the constructor for
tmp
we create a full copy of the big vector (that is managed by the collection)
podio/python/templates/Obj.cc.jinja2
Lines 32 to 37 in 43330a8
{{ obj_type }}::{{ obj_type }}(const {{ obj_type }}& other) : id(), data(other.data){{ single_relations_initialize(OneToOneRelations) }} {%- for relation in OneToManyRelations + VectorMembers %}, m_{{ relation.name }}(new std::vector<{{ relation.full_type }}>(*(other.m_{{ relation.name }}))) {%- endfor %} - In order to only get the slice that we want we create another vector with
new
(this snippet), but do not get rid of the copy we made in the first place
So we need to delete
the original vector in tmp
before we put in the new copy of the slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell from a quick look the Obj(Obj const&)
copy constructor is only called via clone
, so we could actually be a bit more clever with the unnecessary copying and only do the necessary work in clone
.
Can you rebase this to pick the changes from master? |
Fix an issue where it's not possible to push_back to a cloned object that has been read because all the OneToManyRelations in the same collection point to the same vector. The fix is to actually make a copy of the part of the vector that belongs to that object.
After #588 and #589 I can easily test this and now the implementation doesn't have any leaks and does the minimum possible work; only copying the elements that are being pointed to. I was going to exclude the RNTuple test from the tests with sanitizers since it fails for me, but there are others that do so they will have to be changed anyway, I think the same will be seen when ROOT 6.30.06 is available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have a few minor comments / nitpicks.
The one thing that I don't see tested is whether the clone
d objects can be written properly, but since they should now be properly initialized / constructed, they should behave just like any other freshly created object, so things should just work (TM).
// so this slice has to be copied in case we want to modify it | ||
tmp->m_{{ relation.name }}->reserve(m_obj->m_{{ relation.name }}->size()); | ||
for (size_t i = m_obj->data.{{ relation.name }}_begin; i < m_obj->data.{{ relation.name }}_end; i++) { | ||
tmp->m_{{ relation.name }}->push_back((*m_obj->m_{{ relation.name }})[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tmp->m_{{ relation.name }}->push_back((*m_obj->m_{{ relation.name }})[i]); | |
tmp->m_{{ relation.name }}->emplace_back((*m_obj->m_{{ relation.name }})[i]); |
Might be able to skip one of the checks in MabyeSharedPtr
if we get a move this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this can be a move, []
should return something movable no? In any case you don't want to move since you don't want to lose the original relation after a clone()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are probably right, we won't get a move, but we should save creating a temporary for the push_back
, and instead just get one copy constructor call.
{% if prefix %} | ||
return Mutable{{ type }}(podio::utils::MaybeSharedPtr(new {{ type }}Obj(*m_obj), podio::utils::MarkOwned)); | ||
{% else %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly here that if we clone
a Mutable
type, we are guaranteed to not have slicing and skip the rest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, you can only get the pointers to the big vector when reading and then that's a non-mutable object which is now fixed so when you go to Mutable it should be fine. Previously I think it may have been possible by non-mutable (from reading) -> mutable (first clone) -> mutable (second clone) and the second clone would maybe be pointing to the big vector.
Thanks! Quite important to fix this! If you fix the format check issues, I could merge it today |
This caused a regression for us #631 |
The context is the following. Let's say we have a collection and each member has a OneToManyRelation to another type, for this example something that looks like a string. Let's say the first element in the collection has a relation to 'a', then the second to 'b' and so on. Each 'a', 'b', 'c', ... are in a different vector at this point. When saving the collection, all of 'a', 'b', 'c', ... go into the same vector, and there are some indexes to know which part of the vector corresponds to each object in the collection (for example for the first element it would be
start=0
andend=1
. When reading, each object still points to this big vector ['a', 'b', 'c', ...] so if the object is cloned the clone will also point to this big vector. If I clone the first object and then do aaddToRelation('z')
, the big vector will have one more element ['a', 'b', 'c', ..., 'z'] and the indexes will be updated so that now the relations of the first element will be 'a' and 'b' (start=0
andend=2
), when the expected result is 'a' and 'z'.BEGINRELEASENOTES
ENDRELEASENOTES
Initially I implemented it (check if this is the case and if it is then make a new vector) for the
clone()
method but in this commit I have done it for theaddX
methods, since it's closer to on demand and won't make all theclone()
a bit slower.