-
-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducible and portable workflows! #121
base: gh-pages
Are you sure you want to change the base?
Conversation
Create hierarchy of recommendations for reproducible and portable workflows
_extras/recommended-practices.md
Outdated
@@ -6,6 +6,12 @@ permalink: /rec-practices/ | |||
|
|||
Below are a set of recommended good practices to keep in mind when writing a Common Workflow Language description for a tool or workflow. These guidelines are presented for consideration on a scale of usefulness: more is better, not all are required. | |||
|
|||
☐ Reproducibility and Portability are essential goals of scientific workflow developers. | |||
|
|||
- The best way to ensure portability and reproducibility is to rigidly specify the exact environment a tool should run in. Currently a linux image (commonly called a `Docker image`), packaging the exact environment intended by the developer, is the best way to distribute a tool executable. Use `DockerPull` to specify the image. Use an image identifier that is resilient to updates to the container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs line wrapping
Language is too strong, a software container doesn't capture the kernel version nor the CPU type, which can effect reproducibility. It is merely the most reasonable thing we can do today 🙂
Also, containers are often constructed differently or contrary to the software developes's intentions (if any) or in ways they hadn't even considered, so I'd drop that.
Ideally, a container is configured to operate in an unsurprising and as correct as possible manner for the majority of users. Lacking that, it should match the workflow author's needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mr-c thanks for taking a look! I've made some changes.
Note that docker images are the current best solution for the software environment reproducibility issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommendations need to be actionable
description for a tool or workflow. These guidelines are presented for consideration on a scale of | ||
usefulness: more is better, not all are required. | ||
|
||
☐ Reproducibility and Portability are essential goals of scientific workflow developers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence no verb :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am arguably a native speaker and I beg to differ. Not only is it a complete and grammatically correct sentence, but it does too have a verb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I was too harsh, my apologies. The point being, this is supposed to be a list of actions: things to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mr-c oh not at all! It did send me on a trip down memory lane to English language class. I vaguely recall that there can be sentences without verbs. I suspect it was "Yes." and "No."
I was entertained by https://english.stackexchange.com/questions/258/shortest-comprehensive-sentence-in-english
Ok, back to work. Many thanks for reviewing! I will address your comments
Ideally a workflow developer would be able to rigidly specify the software and hardware | ||
environment a tool should run in to ensure portability and reproducibility. | ||
|
||
- Currently (2018) the best way approach this ideal is to package the exact software environment in an image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Currently (2018) the best way approach this ideal is to package the exact software environment in an image | |
- Currently (2018) the best way approach this ideal is to package the exact software environment in software container |
environment a tool should run in to ensure portability and reproducibility. | ||
|
||
- Currently (2018) the best way approach this ideal is to package the exact software environment in an image | ||
(such as a `Docker Image`) and specify the image via the `DockerPull` field. Use an image identifier that is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(such as a `Docker Image`) and specify the image via the `DockerPull` field. Use an image identifier that is | |
(in the Docker container format) and specify the image via the `DockerPull` field. Use an image identifier that is |
|
||
- Currently (2018) the best way approach this ideal is to package the exact software environment in an image | ||
(such as a `Docker Image`) and specify the image via the `DockerPull` field. Use an image identifier that is | ||
resilient to updates to the container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be specific; what does "image identifier that is resilient to updates to the container" mean? Show an example.
resilient to updates to the container. | ||
- If this is not possible, carefully specifying software tools and dependencies using `SoftwareRequirement` | ||
is the next best resort. Be aware that changes in the tool repositories the tools are being pulled from | ||
may silently change the behavior of the tool at each run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may silently change the behavior of the tool at each run. | |
may silently change the behavior of the tool in the future.``` |
may silently change the behavior of the tool at each run. | ||
- Not specifying a docker image or software requirements will result in a non-reproducible, | ||
non-portable workflow! | ||
- Do specify CPU and memory requirements where required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be a separate suggestion with its own check box and example.
IMHO would be nice to revisit this one. I added FAIR somewhere in the user guide. Or at least I had it in the first drafts as I realized we didn't have much about FAIR-ness, reproducibility, etc, in the user guide. And by coincidence I am writing some documentation for a project at $work that needed to include requirements for portability of workflows. I found the Cylc documentation a good starting point, so maybe it could be useful to take a look at that doc when reviewing this PR too: https://cylc.github.io/cylc-doc/stable/html/workflow-design-guide/portable-workflows.html Cheers |
Create hierarchy of recommendations for reproducible and portable workflows