Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSL2 Modules: Support .diff files for minor changes #1312

Closed
2 of 3 tasks
ewels opened this issue Nov 8, 2021 · 8 comments · Fixed by #1708
Closed
2 of 3 tasks

DSL2 Modules: Support .diff files for minor changes #1312

ewels opened this issue Nov 8, 2021 · 8 comments · Fixed by #1708
Assignees
Labels
command line tools Anything to do with the cli interfaces modules Related to tools for working with DSL2 modules

Comments

@ewels
Copy link
Member

ewels commented Nov 8, 2021

Whilst the central nf-core/modules repo of shared DSL2 modules is amazing, we may face occasions when pipelines need to have minor differences that should not be in the main modules repository. For example, nf-core/methylseq needs the genome index on a per-sample basis, which is not consistent with the rest of nf-core/modules.

At the moment, the only solution is to keep local copies of modules. This is undesirable, as it will lead to inevitable seperation creep between the local module and the central repository. Local modules can't be updated and fixes are less likely to be contributed back.

Instead of this, I propose that nf-core/tools supports the existence of .diff / .patch files for modules, which allow pipelines to apply minor changes on top of centralised modules. This keeps the two in sync for updates, but allows minor variations.

Like much of what we do, Bioconda has beaten us to this. For example, see the ISACC recipe:

meta.yaml:

  patches:
    - patch-configure.diff
    - patch-src-cplusplus-lib-alignment-TemplateBuilder.cpp.diff
    - patch-src-cplusplus-lib-io-FileBufWithReopen.cpp.diff

patch-configure.diff:

--- src/configure.orig	2020-05-20 21:40:21.259601460 +0100
+++ src/configure	2020-05-20 21:41:13.987993770 +0100
@@ -270,8 +270,7 @@
 # after they appear there.
 isaac_cmake_install_dir="${isaac_build_dir}/bootstrap_cmake"
 if [ "x${isaac_cmake}"       == "x" ] ; then 
-    "${isaac_bootstrap_dir}"/installCmake.sh "${isaac_redist_dir}" "${isaac_cmake_install_dir}" "${isaac_parallel}"
-    cmake_install_exit_code="$?"
+    cmake_install_exit_code="1"
     if [ "$cmake_install_exit_code" == "1" ]; then
         isaac_cmake=cmake
         echo "Using existing `which cmake`"

So we could have a field in modules.json that points to one or more patch .diff files in the pipeline repo. These patches would be applied when installing / updating modules, and taken into consideration when linting.

For a concrete example with nf-core/methylseq, see ewels/nf-core-methylseq#2

  • nf-core modules install
  • nf-core modules update
  • nf-core modules lint
@ewels
Copy link
Member Author

ewels commented Nov 24, 2021

Maybe need a new modules subcommand to generate and update these .diff files?

Potential ideas:

  • nf-core modules local-changes bismark/align
  • nf-core modules local-changes --all
  • nf-core modules local-changes with interactive prompts?

So workflow would be:

  • Install / update vanilla modules
  • Maybe force a commit / clean git history?
  • Make local changes to module files
  • Run nf-core modules local-changes to detect these local changes and generate diff files, update modules.json
  • Automatically commit? (maybe with interactive prompts?)
  • Continue to use nf-core modules update etc as before, diffs applied after each update

Thoughts?

@Midnighter
Copy link
Contributor

Just an opinion: I find patch a bit snappier than local-changes, i.e., nf-core modules patch.

So you imagine one should run the command with the module being in dirty state? Then the command would generate the diff and create a patch file and reset the original module to the commit defined in the modules.json? Or do you think the module modifications should be committed, too, so that the patch file is just a record of that change?

Then what happens on update? You force overwrite the module with the nf-core/modules version and re-apply that patch? In case of conflicts in applying the patch, they need to be handled manually and a new patch is generated when calling the command again?

@ewels
Copy link
Member Author

ewels commented Nov 24, 2021

Agree, patch is perfect 👍🏻 I was trying to avoid over-using diff but also felt that loca-changes was a bit clunky. Let's go for patch instead..

I think that the module modifications should be committed. We want the pipeline files to work out of the box as flat files wherever possible I think, to keep things simple. So yeah, the patch file is just to record changes to allow the linting and module updates to work.

Then what happens on update? You force overwrite the module with the nf-core/modules version and re-apply that patch? In case of conflicts in applying the patch, they need to be handled manually and a new patch is generated when calling the command again?

Yes, exactly 👍🏻 When running the update it'll work exactly as it does currently - checking the local version passes linting, delete it, download the new copy, update modules.json and then reapply the patch file.

In case of a merge, we need to bail with an error. Presumably we leave the vanilla version of the module from nf-core/modules in place but with a dirty git state. Then the user manually reapplies the changes and runs nf-core modules patch again to regenerate the patch file.

Need to make sure we don't end up with orphaned patch files is the only thing. So maybe we need to keep them in a canonical directory location so that we can lint that all of them are referenced in modules.json to force developers to delete old orphaned patch files.

@Midnighter
Copy link
Contributor

@ewels I see that diffing is also handled in the PR that you linked. How do you see the interaction between the patch and the update command? I think, there should be one obvious way to do what you want and having both commands potentially generate diff files could be confusing. What do you think?

@ewels
Copy link
Member Author

ewels commented Dec 1, 2021

The PR is for something completely different, hence the potential for confusion 😬 That PR shows a diff of what will be changed in a module when you update. Basically a preview of what will be done when you run the update command. So for example, differences between version 1.1 of a module and 1.2 of a module. It has nothing to do with local modifications of a module.

Suggestions for how to make this distinction as clear as possible welcome!

@Midnighter
Copy link
Contributor

I see, thank you for the clarification.

@ewels ewels added the modules Related to tools for working with DSL2 modules label Apr 13, 2022
@ErikDanielsson ErikDanielsson mentioned this issue Jul 20, 2022
4 tasks
@ErikDanielsson ErikDanielsson linked a pull request Aug 1, 2022 that will close this issue
4 tasks
@ewels
Copy link
Member Author

ewels commented Aug 18, 2022

This is SO FREAKING COOL

@ewels ewels closed this as completed Aug 18, 2022
@Midnighter
Copy link
Contributor

❤️ @ErikDanielsson this will be so great to use. Thanks a lot for making an implementation. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
command line tools Anything to do with the cli interfaces modules Related to tools for working with DSL2 modules
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants