Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROOTFrameWriter can produce unreadable files without warning #382

Closed
tmadlener opened this issue Feb 27, 2023 · 8 comments · Fixed by #513
Closed

ROOTFrameWriter can produce unreadable files without warning #382

tmadlener opened this issue Feb 27, 2023 · 8 comments · Fixed by #513

Comments

@tmadlener
Copy link
Collaborator

The documentation states that the contents of the first Frame of a category determine the contents of all Frames of a given category:

/** Store the given frame with the given category. Store all available
* collections from the Frame.
*
* NOTE: The contents of the first Frame that is written in this way
* determines the contents that will be written for all subsequent Frames.
*/
void writeFrame(const podio::Frame& frame, const std::string& category);

However, there is no currently no warning in case this happens, and the writer will simply ignore additional collections. This also has the potential to break the reader side, especially if the missing collections are not simply the last ones to be added to the frame, e.g.

#include "write_frame.h"  // from the tests directory for easier creation of collections

#include "podio/Frame.h"
#include "podio/ROOTFrameWriter.h"

int main() {
  auto writer = podio::ROOTFrameWriter("broken_frame.root");

  auto frame = podio::Frame();
  frame.put(createHitCollection(10), "hits");
  frame.put(createMCCollection(), "mcparticles");
  writer.writeFrame(frame, "events");

  frame = podio::Frame();
  const auto& hits = frame.put(createHitCollection(10), "hits");
  frame.put(createClusterCollection(hits), "clusters");
  frame.put(createMCCollection(), "mcparticles");
  writer.writeFrame(frame, "events");

  writer.finish();

  return 0;
}

produces a file for which

podio-dump -e1 broken_frame.root

will run into a segmentation fault, because the collection ID for that is necessary for resolving the relations of the ExampleMCs has changed between the frames.

The cause for the segmentation fault is actually a different issue (#381). Nevertheless, there should probably at least be a warning if different contents are present in Frames that should be written with the ROOTFrameWriter.

@hegner
Copy link
Collaborator

hegner commented Jul 19, 2023

@tmadlener - after moving to hashes - how relevant is this now? so should it be a strict requirement of PODIO or just a policy of us in Key4hep?
There are some online use cases where different event contents actually make sense.

@tmadlener
Copy link
Collaborator Author

It is in principle a strict requirement of the ROOTFrameWriter. Things might work if collections are missing in events after the first, but new collections in all but the first event will still crash the whole thing.

The hashes avoid creating unreadable files, but the contents will still be potentially unexpected.

@hegner
Copy link
Collaborator

hegner commented Jul 19, 2023

That means it is limited by implementation (setup of the branches etc), not a design decision. Just for future reference in case we want to change the policy
For now, I can check whether I can play with our map and "freeze" it for further inserts. And trigger an exception in case.

@tmadlener
Copy link
Collaborator Author

Well technically it is an implementation that is forced onto podio by a design decision of ROOT. I don't think we can ever change policy here. It might work as expected with TTrees but we cannot guarantee that as this is entirely dependent on what the users actually want to do. With RNTuple there is no wiggle room any longer as you have to close the model you want to write before you first write anything with it.

The simplest (and most likely sufficient) check for now would be to see whether the collections that should be written are the same as the ones that are available from the collection id table that was used for the first write.

@hegner
Copy link
Collaborator

hegner commented Jul 19, 2023

Yap. I was thinking of something like that. Looking at the map. Will put it on my todo list

@hegner hegner self-assigned this Jul 19, 2023
@hegner
Copy link
Collaborator

hegner commented Jul 20, 2023

It seems the case of writing out something that hasn't been defined in the first event will not lead to a crash any more. Since the writing is based on the catInfo which is initialized from the first collsToWrite. The only thing that can happen is missing collections in later events. Need to add a test to handle that.
I am not sure we can easily warn about collections that are not written out. Since this is a feature.
I was thinking about the option to change the interface a bit so that the implicit ROOTFrameWriter::writeFrame w/o the collsToWrite parameter only works in later calls, and we introduce a special kind of firstWrite. But that would make the interface quite ugly.

@tmadlener
Copy link
Collaborator Author

What happens in case there is a collection missing in the frame in the second event, i.e. something where we switch the two Frames in the example given in this issue? Does that crash, or will we simply write an empty collection?

@hegner
Copy link
Collaborator

hegner commented Jul 20, 2023

See above. Will integrate a test.
However, the reading is then not well defined. And we should decide on the policy there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants