Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a hook to inject datamodel version information into podio internals #651

Merged
merged 18 commits into from
Sep 10, 2024

Conversation

tmadlener
Copy link
Collaborator

@tmadlener tmadlener commented Jul 31, 2024

BEGINRELEASENOTES

  • Make it possible to pass the datamodel version (which does not have to be the schema version!) to the class generator to inject it into the podio internals. This will automatically be stored in all output files, such that this information can also be obtained from the files.

ENDRELEASENOTES

@peremato this would make e.g. the EDM4hep version appear in the output files (once the few lines are added to the edm4hep.yaml file and its version header).

Conceptually, this relies on the fact that a datamodel will have a version header of some form and will just try to reuse that. The main requirements are that the version class of the datamodel is convertible to apodio::version::Version.

Fixes #630

@m-fila
Copy link
Contributor

m-fila commented Jul 31, 2024

Wouldn't it be more straightforward to just add a simple field for version in the yaml and then relay on a build system to populate it somehow (e.g. configure_file in cmake)? A version header file will be most likely generated by build system too

Or inject the version information with PODIO_GENERATE_DATAMODEL rather than through yaml file?

@tmadlener
Copy link
Collaborator Author

tmadlener commented Jul 31, 2024

I would like to avoid having to configure_file the full datamodel yaml file. Then I went with something that would allow sort of minimal work in EDM4hep, where we already have a version header that we configure via the build system. In the end this is a first proposal, so I am very open for suggestions.

Another possibility I thought about was to configure a smaller YAML file with the version information (but then we might be configuring quite a few files in the end).

Or inject the version information with PODIO_GENERATE_DATAMODEL rather than through yaml file?

Hadn't thought about that, but also a possibility. Would also make it a bit more language agnostic.

In the end how the information is injected exactly is rather flexible in this PR, so it should be fairly straight forward to do what we want to do.

@tmadlener
Copy link
Collaborator Author

I have change the injection of the datamodel version info to be a commond line argument to the class generator instead of being part of the YAML file. I have also added a new VERSION argument to PODIO_GENERATE_DATAMODEL to make this easily accessible from CMake.

@tmadlener tmadlener force-pushed the gen-edm-version-storage branch 2 times, most recently from cc55fd0 to 18a1129 Compare August 28, 2024 14:05
@tmadlener
Copy link
Collaborator Author

I am not sure if it's an argument in any direction, but I would at least like to point it out (and write it down for posteriority): If the version information is not part of the YAML configuration, the "roundtripped generated code" might be different because the generated code now depends on some pretty much ephemeral version input. On the other hand if we pull in a c++ variable (like we did in the first approach) we have the same code, but potentially different contents, because the value of that variable can change without visible changes in the generated code.

@tmadlener
Copy link
Collaborator Author

As discussed during todays EDM4hep meeting, this PR effectively imposes that packages that use podio follow SemVer as we use that format to store the information. There is no real technical blocker to store the version as e.g. a string. However, we will defer implementing that until someone actually requests that feature.

@tmadlener tmadlener closed this Sep 10, 2024
@tmadlener tmadlener reopened this Sep 10, 2024
doc/datamodel_syntax.md Outdated Show resolved Hide resolved
@@ -505,12 +514,14 @@ def _write_all_collections_header(self):
def _write_edm_def_file(self):
"""Write the edm definition to a compile time string"""
model_encoder = DataModelJSONEncoder()
print(f"{self.datamodel_version=}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(f"{self.datamodel_version=}")

Debugging print? Otherwise it's going to be a bit weird like that whenever it appears with self.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, few seconds too late. This is indeed a debug print that I missed while having another look.

@tmadlener tmadlener merged commit 8c7ece3 into AIDASoft:master Sep 10, 2024
18 checks passed
@tmadlener tmadlener deleted the gen-edm-version-storage branch September 10, 2024 13:51
@peremato
Copy link
Collaborator

@tmadlener I am a bit confused. What is the "datamodel version"? I can see in the generated files the PODIO version and the EDM4hep schema version but not datamodel. See what I print using the Julia interface:

┌────────────────┬─────────────────────────┐
│ Atribute       │ Value                   │
├────────────────┼─────────────────────────┤
│ File Name(s)   │ test/EDM4hep_TTree.root │
│ # of events    │ 3                       │
│ IO Format      │ TTree                   │
│ EDM4hep schema │ 2.0.0                   │
│ PODIO version  │ 1.0.1                   │
│ ROOT version   │ 6.32.4                  │
└────────────────┴─────────────────────────┘ 

@tmadlener
Copy link
Collaborator Author

The datamodel version would be the tag for EDM4hep. With the current main branch of EDM4hep you would get 0.99.0. However, you will also need key4hep/EDM4hep#362 first, because that is necessary to pass that information to podio in the first place.

@peremato
Copy link
Collaborator

Thanks. What will be the relation between the schema version and the datamodel version? Will the major version of datamodel be the schema version? What about compatibility rules (at least for I/O) ?

@tmadlener
Copy link
Collaborator Author

The one thing that should always be true is: If the schema version increases, the datamodel version also has to increase (but not necessarily vice versa). How this increase happens is in principle up for the actual datamodel authors to define any relation they want there. IMHO the minimal set of rules are (if we focus more on the API side of things and assume that we have schema evolution for every change we make to the datamodel):

  • Any change of schema version implies a change in datalayout. Hence, this impacts I/O (as it needs some form of schema evolution)
  • Any change of schema version also implies a change of the datamodel. However, what you do in this case might be a bit more nuanced
    • Adding a new datatype or adding a new member to an existing datatype would only imply a bump in minor version if you follow SemVer
    • Removing a datatype or a member from a datatype would imply a bump of the major version if you follow SemVer
    • Renaming members or datatypes would also imply a bump of major version if you follow SemVer strictly

One aspect that could also be considered, although I am not sure we will ever hit it in reality is the following: It would be possible to reserve major version changes to cases where we drop backwards compatibility mainly on the I/O side (i.e. we explicitly exclude the API side from SemVer). This would happen if we don't want to (or can't) provide a schema evolution mechanism for some changes to the datamodel. But I am not sure if we should consider this as a "normal use case".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make it possible to store more version information of generated EDMs
4 participants