-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce common input section that will be applied to all inputs #31
Comments
If you have time @observingClouds I would appreciate your thoughts on this :) |
Is there a specific reason other than trying to avoid duplications of entries? I feel like the config files are quite short so far with only a few file definitions and adding duplicate information wouldn't really reduce the readability. In case we would include this common key, I would argue for having it outside of the input keys to not mix it with variable names and make validations of the json files easier, e.g.
|
Yes, the main reason is that having repetitions makes errors hard to spot (e.g. typos where parts shouldn't be the same, but aren't) and sometimes it might even be unclear that some section actually share parts of their config, this is basically the DRY principle :) This feature isn't a necessity thought, but I thought it might be a nice thing to do to keep the config tidier.
Ok, interesting. I don't think for the yaml-file validation this would make things easier though since my idea is imply to copy the values of the Having a "global" (as in the root of the config tree) |
Yes, I could imagine that there might be use cases too.
Yes, BUT currently you can for example just loop through the datasets ( |
The problem
In working on the datastores functionality in neural-lam I needed the ability to define the projection information for the grid-coordinates in a zarr datasets created by
mllam-data-prep
. To do this a had a bit of a conundrum: where in the config to store this info? In the situation where all input datasets are given on the same projection it would be ideal to define this information only once in the config, but to make it clear that this information applies to every single source. Currently, it isn't possible to set config parameters "globally" for all inputs.The proposition here is to add this common section
For context I have currently made (on a separate branch #18, which I would like to replace with a better implementation) what I feel like is a bit of a hack: adding an
extra
section to the config which the user can use to set things in the config that mllam-data-prep should ignore but can be used by other tools that read the same config. I don't really like this because this projection information is not used or even validated bymllam-data-prep
. To add the domain-cropping that @TomasLandelius is planning to implement having this projection information understood bymllam-data-prep
will also be key I think, so thatmllam-data-prep
knows how to get the latitude and longitude for each grid-point for each input dataset being used in a config.Proposition
Add a new optionally specially-named input called
__common__
the contents of which will be copied to all the inputs given.So instead of writing:
one could write
This would then enable me to add an optional
projection
config parameter which I will do in a separate PR.To make this as simple as possible it will be implemented so that collisions between keys in the common config and the config for a specific input isn't allowed. The process of merging the common config into the config a specific input is in effect a process of merging two tree structures. The leaf-nodes are the values in the
input
dictionary, and the branch-nodes are the nested keys. The merging will be allowed to introduce new keys, but not overwrite values.The text was updated successfully, but these errors were encountered: