Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the import of the new industry module #386

Merged

Conversation

timtroendle
Copy link
Member

There were two problems with the industry module that broke the main workflow:

  • all rule was overwritten
  • config was overwritten

These are fixed here.

Checklist

Any checks which are not relevant to the PR can be pre-checked by the PR creator. All others should be checked by the reviewer. You can add extra checklist items here if required by the PR.

  • CHANGELOG updated
  • Minimal workflow tests pass
  • Tests added to cover contribution
  • Documentation updated
  • Configuration schema updated

@brynpickering
Copy link
Member

what, if any, disadvantage is there of having the industry config unavailable from the top-level? E.g., if submodularising Euro-Calliope, how would one now provide overrides for default industry module config items?

@timtroendle
Copy link
Member Author

I don't know how and whether that would work at all. It would be good to test that. I'd personally have the config in the top-level, but isn't that somewhat out of scope?

@timtroendle
Copy link
Member Author

Still, good comment because it made me available that I broke the config and that went through unnoticed because of a additionalProperties: true.

@brynpickering
Copy link
Member

I don't know how and whether that would work at all. It would be good to test that. I'd personally have the config in the top-level, but isn't that somewhat out of scope?

I suppose it is, if our ultimate aim is to move the industry "module" out of the core EC repo.

@timtroendle
Copy link
Member Author

I agree. Before we do that, we should have a proper plan of the pros and cons though. I wouldn't do that any time soon -- likely not before the next release.

@timtroendle
Copy link
Member Author

Having said that, I figured there is another problem: #387.

@irm-codebase
Copy link
Contributor

irm-codebase commented May 31, 2024

I do not approve this pull request., it would break our entire design.

The module was designed according to snakemake specs:
https://snakemake.readthedocs.io/en/stable/snakefiles/modularization.html#modules

  • Adding another config file does not break anything as long as you do not overwrite anything in the dictionary. This is why I added the industry: at the top of the industry yaml. Please see here: https://stackoverflow.com/questions/62696099/snakemake-multiple-config-files
  • Renaming module rules completely avoids any and all potential conflicts with all rules. There is no need to create a specific .smk for this.

In theory any EC project will be completely free to use the industry module as they see fit once we move things outside.

Please reconsider merging this in.

@irm-codebase
Copy link
Contributor

@timtroendle

Can you please try doing the following in the code without your changes?
This fixes everything: no configuration conflicts, we still have separate module configuration files, and we still import modules as suggested by snakemake.

rule all:
    message: "Generate euro-calliope pre-built models and run tests."
    localrule: True
    input:
        "build/logs/continental/test.success",
        "build/logs/national/test.success",
        expand(
            "build/models/{resolution}/{file}",
            resolution=["continental", "national", "regional", "ehighways"],
            file=["example-model.yaml", "build-metadata.yaml", "summary-of-potentials.nc", "summary-of-potentials.csv"]
        )
    default_target: True   # <----- This right here

@brynpickering
Copy link
Member

brynpickering commented May 31, 2024

@irm-codebase I'm not sure this change breaks things as you suggest. The module rules are renamed as suggested, they are just "imported" from the industry.smk file rather than taking up space in the main Snakefile (and being considered as the default rule when e.g. creating the DAG).

We could (and should) fix the default rule issue by using the default_target flag mentioned by @irm-codebase, but I agree with @timtroendle that it is still neater to have the module in a separate file. Any other modules of this kind can have their own .smk. When modules get moved into their own repos, we just kill that .smk file. My concern is that with more modules of this kind, the Snakefile will become increasingly messy.

I'd stick with the config file as it was (under the industry top-level heading) and loading it as part of the core config, so that it is easy to modify along with all other EC config options. Other changes to the schema are good by me.

@brynpickering
Copy link
Member

brynpickering commented May 31, 2024

This would be the content of rules/industry.smk if it were moved into its own file:

configfile: "modules/industry/config.yaml"
validate(config["industry"], "../modules/industry/schema.yaml")

module module_industry:
    snakefile: "../modules/industry/industry.smk"
    config: config["industry"]
use rule * from module_industry as module_industry_*

In addition, the first level of the schema would be removed and additionalProperties: false set. I agree with @timtroendle that robust validation against the schema requires this flag changed to False.

@irm-codebase
Copy link
Contributor

irm-codebase commented May 31, 2024

@brynpickering I can settle for that!

Just please keep the industry top level heading in industry's config.yaml. That one is my main worry because it will cause potential conflicts between parameters in modules. Module devs should not care about what names others are using in their modules.

As for industry.smk: how about an in-between like module_imports.smk?
This way we do not have 20 different files as we modularise more. Unless we thing we'll add additional rules per imported module down the line...

For additionalProperties: false I also agree. That one is probably a mistake on my end, I think I set it to false for levels below that.

Copy link
Contributor

@irm-codebase irm-codebase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are general comments on how to fix these bugs while keeping modularity.

@@ -1,13 +1,12 @@
industry:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the industry top level. Otherwise, modules and general configuration will conflict between each other if developers use similar names.

properties:
industry:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. The definition of this level is crucial for modules to work properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to remove it from here (and necessary for schema validation given that the config is merged with the rest of the project config). validating would become validate(config["industry"], "../modules/industry/schema.yaml") which means industry: should not be in the schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to have it here so that the module makes sure you are specifying the module configuration separately. Otherwise, someone might add all these at the top level, causing conflicts!

The only limitation that this might impose is that modules cannot expect the same name, which is good, imo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider using one file for module imports, not individual files per module. This should minimize bloat down the line.

We can go with individual files if we think we may write additional module rules at the project level (snakemake allows this).

I do not have strong opinions on this, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the one file idea.

Snakefile Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core bug is fixed by adding default_target: True to rule all:

This should avoid having to heavily modify the industry module beyond the schema improvements.

@brynpickering
Copy link
Member

Sounds good @irm-codebase. Happy to make the updates @timtroendle ?

@irm-codebase
Copy link
Contributor

@timtroendle @brynpickering
After some testing, I remember now why additionalProperties: true was set this way.
It was so you could just pass the whole configuration file.

The reasoning was this:

  • the schema should check that the industry top level was specified (to avoid conflicts with other modules).
  • the schema should ignore (i.e., allow) configuration for other modules and the main configuration.

If you set it to false, the test will fail because it will see other stuff in there. So do keep it, please!

@brynpickering
Copy link
Member

If you set it to false, the test will fail because it will see other stuff in there. So do keep it, please!

Not if you only pass config["industry"] to the validation. Setting it to true means misspellings will not be picked up, which is one of the main benefits of the schema.

@irm-codebase
Copy link
Contributor

If you set it to false, the test will fail because it will see other stuff in there. So do keep it, please!

Not if you only pass config["industry"] to the validation. Setting it to true means misspellings will not be picked up, which is one of the main benefits of the schema.

The schema is directly checking for industry, so spelling checking is quite explicit here (in terms of ensuring that users specify it). But... if they set idnustry someplace else it won't be detected, so I see your point.

I guess it's just preference, then. Either is fine.

@timtroendle
Copy link
Member Author

Sorry this was based on my wrong assumption that one cannot call "configfile" twice. I changed according to all your comments, @irm-codebase and @brynpickering , please have another look.

Copy link
Contributor

@irm-codebase irm-codebase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. But I added one last suggestion.

@@ -0,0 +1,9 @@
# Industry
configfile: "./modules/industry/config.yaml"
validate(config["industry"], "../modules/industry/schema.yaml")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one final improvement.

I think we should move this validation within the module's smk. This way, we avoid users skipping validation because they find it cumbersome and always enforce it.

That way, we still force users to separate a module's configuration due to additionalProperties: false (although it's not really explicit code-wise).

Could you test this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, also in terms of separation of concerns. I tested it and it works.

Copy link
Contributor

@irm-codebase irm-codebase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good.

I'll make sure to merge these changes into future module updates.

@timtroendle
Copy link
Member Author

I have one more comment myself. Personally, I don't think the "industry" key in the config in ./modules/industry/config.yaml makes sense. The key does not exist in the context of the module, it only exists when added to the main workflow. That's why it should only exist within the context of the main workflow. Do you agree?

What I suggest to do:

  • Remove the industry key in the config that lives within the module.
  • Load the config as a default within the module.
  • Overwrite the config from the main workflow, using the "industry" key. This will cause a redundancy, but it's the cleanest approach to me. It will make the module truly independent.

@irm-codebase
Copy link
Contributor

irm-codebase commented Jun 2, 2024

I have one more comment myself. Personally, I don't think the "industry" key in the config in ./modules/industry/config.yaml makes sense. The key does not exist in the context of the module, it only exists when added to the main workflow. That's why it should only exist within the context of the main workflow. Do you agree?

What I suggest to do:

* Remove the industry key in the config that lives _within_ the module.

* Load the config as a default _within_ the module.

* Overwrite the config from the main workflow, using the "industry" key. This will cause a redundancy, but it's the cleanest approach to me. It will make the module truly independent.

I agree. Down the line, the configuration inside the module will only serve as an example. In theory, projects should be able to specify their own configuration, overriding the default. So I do not see this as redundant at all.

However, I tried the approach you mention, but it lead to issues. Snakemake would return an error when it detected the configfile: command inside the module. This was despite what their wiki said. I am not sure if 8.10 fixed this issue.

There is another reason why I kept the configuration within the folder: it's because the module is still in development, and I want to avoid doubling efforts by having a config file in another location as well. Once it's fully ready, we should take your approach.

I added this note to #381. Still, please let's avoid having two config files before the rest of the steps are done.

@irm-codebase
Copy link
Contributor

irm-codebase commented Jun 2, 2024

One last argument in favor of keeping the "industry" key even for the example configuration @timtroendle :

It keeps correct module compartmentalization explicit.

I believe there should be minimal friction between the configuration file that users see as an example, and how users will actually employ it. If we remove this level, this becomes implicit: we need extra documentation or users reading the schema. Otherwise they will get errors.

I know it's a minor detail, but I think that this kind of thing is what makes code easy to use for others.

We can go either way, of course.

@timtroendle
Copy link
Member Author

Ok, that's fine by me. @brynpickering any other comments or do you approve?

@brynpickering brynpickering merged commit 7f4e50a into calliope-project:develop Jun 6, 2024
4 checks passed
@timtroendle timtroendle deleted the fix-industry-module branch June 6, 2024 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants