Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move input coordinates to extra subsection #30

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

observingClouds
Copy link
Contributor

@observingClouds observingClouds commented Oct 30, 2024

To-Do:

  • Where are the config schemas defined? The tests contained both version 0.1.0 and 0.2.0

    - introduce `coordinates` subsection
    - prepare for additional variable subsections
@leifdenby
Copy link
Member

  • introduce coordinates subsection

I think this is a good idea, but we should discuss how to deprecate/change the config schema.

[ ] Where are the config schemas defined? The tests contained both version 0.1.0 and 0.2.0

We should discuss this. Sorry this is missing in the README, I could've typed up my current thinking.

I don't have a clear plan for this though, so would appreciate your input. I think it is quite hard to reconcile having a config schema version and a package version. I had thought that instead of versioning the schema separately from the package it might be simpler do to this together. My idea was that every time we create a new schema that would be versioned with the same version number as the package. As in, if we see the need to change the schema then that would only be allowed to change when we release a new tagged version of the package, and then schema version will match that same package version. This answer the question of "with this given config that uses a the config spec version what minimum version of mllam-data-prep will I need?", but doesn't handle backward compatibility. It might be that we should enforce that configs would be backwards compatible, but for how long?

@observingClouds
Copy link
Contributor Author

I feel like the versioning and deprecating of configs should be something that is well established, but I could not really find any general solutions.

I currently think we should:

  • collect the schemas in a separate folder for better visibility and being able to keep several versions at the same time
  • use pydantic to validate input and define migration functions and version incompatibilities

@joeloskarsson
Copy link
Contributor

I like the idea of the schema version matching the package version, keeps things simple.

For now I don't think there is a need to worry about backwards compatibility until a 1.0.0 release. Better to build things to a state where it is specified in a way that we find useful to work with. Then that version can be fixed. After 1.0.0 one could follow semantic versioning for the schema as well, only allowing breaking backwards compat. when releasing new major versions.

@leifdenby
Copy link
Member

just a quick comment on your implementation here. Currently you are suggesting to put selection along coordinates into a new subsection called coordinates in the config for selecting a specific variable

Screenshot 2024-11-01 at 20 37 02 (sorry I'm not sure how to write diff comment...)

I think is sensible since this is more explicit, but maybe we should call the subsection sel (to indicate that it uses the xr.DataArray.sel(...) method) or sel_coords or something like that? coords might suggest we are somehow setting the coordinates instead. I had initially not introduced this subsection because I couldn't think of anything else we might want to do. But one might for example want to ensure that the units are correct for a the variable you are selecting. In general I am realising now that it is currently implicit that the variables section is about selecting variables, whereas one might mistakingly assume that it is also possible to set things (like the units if they are missing on a variable). I can see a need for that in future if the source file someone is trying to read doesn't have for example the long_name or units attributes set. What do you think?

@observingClouds
Copy link
Contributor Author

That this section is mostly about selecting has indeed be not clear to me. However, if we call the subsection coordinates both the selection as well as the expected units can be part of this section. If we would use sel or sel_coords, units would no longer fit to this section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants