Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible and portable workflows! #121

Open
wants to merge 6 commits into
base: gh-pages
Choose a base branch
from
Open

Conversation

ghost
Copy link

@ghost ghost commented Oct 17, 2018

Create hierarchy of recommendations for reproducible and portable workflows

Create hierarchy of recommendations for reproducible and portable workflows
@ghost ghost requested review from mr-c, stain and tetron October 17, 2018 14:57
mr-c
mr-c previously requested changes Oct 18, 2018
@@ -6,6 +6,12 @@ permalink: /rec-practices/

Below are a set of recommended good practices to keep in mind when writing a Common Workflow Language description for a tool or workflow. These guidelines are presented for consideration on a scale of usefulness: more is better, not all are required.

☐ Reproducibility and Portability are essential goals of scientific workflow developers.

- The best way to ensure portability and reproducibility is to rigidly specify the exact environment a tool should run in. Currently a linux image (commonly called a `Docker image`), packaging the exact environment intended by the developer, is the best way to distribute a tool executable. Use `DockerPull` to specify the image. Use an image identifier that is resilient to updates to the container.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs line wrapping
Language is too strong, a software container doesn't capture the kernel version nor the CPU type, which can effect reproducibility. It is merely the most reasonable thing we can do today 🙂

Also, containers are often constructed differently or contrary to the software developes's intentions (if any) or in ways they hadn't even considered, so I'd drop that.

Ideally, a container is configured to operate in an unsurprising and as correct as possible manner for the majority of users. Lacking that, it should match the workflow author's needs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mr-c thanks for taking a look! I've made some changes.

Kaushik Ghose added 2 commits October 18, 2018 16:06
Note that docker images are the current best solution for the software environment reproducibility issue
Copy link
Member

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mr-c mr-c dismissed their stale review October 19, 2018 09:42

whoops, meant this review for the other PR

Copy link
Member

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommendations need to be actionable

description for a tool or workflow. These guidelines are presented for consideration on a scale of
usefulness: more is better, not all are required.

☐ Reproducibility and Portability are essential goals of scientific workflow developers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence no verb :-)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am arguably a native speaker and I beg to differ. Not only is it a complete and grammatically correct sentence, but it does too have a verb.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I was too harsh, my apologies. The point being, this is supposed to be a list of actions: things to do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mr-c oh not at all! It did send me on a trip down memory lane to English language class. I vaguely recall that there can be sentences without verbs. I suspect it was "Yes." and "No."

I was entertained by https://english.stackexchange.com/questions/258/shortest-comprehensive-sentence-in-english

Ok, back to work. Many thanks for reviewing! I will address your comments

Ideally a workflow developer would be able to rigidly specify the software and hardware
environment a tool should run in to ensure portability and reproducibility.

- Currently (2018) the best way approach this ideal is to package the exact software environment in an image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Currently (2018) the best way approach this ideal is to package the exact software environment in an image
- Currently (2018) the best way approach this ideal is to package the exact software environment in software container

environment a tool should run in to ensure portability and reproducibility.

- Currently (2018) the best way approach this ideal is to package the exact software environment in an image
(such as a `Docker Image`) and specify the image via the `DockerPull` field. Use an image identifier that is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(such as a `Docker Image`) and specify the image via the `DockerPull` field. Use an image identifier that is
(in the Docker container format) and specify the image via the `DockerPull` field. Use an image identifier that is


- Currently (2018) the best way approach this ideal is to package the exact software environment in an image
(such as a `Docker Image`) and specify the image via the `DockerPull` field. Use an image identifier that is
resilient to updates to the container.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be specific; what does "image identifier that is resilient to updates to the container" mean? Show an example.

resilient to updates to the container.
- If this is not possible, carefully specifying software tools and dependencies using `SoftwareRequirement`
is the next best resort. Be aware that changes in the tool repositories the tools are being pulled from
may silently change the behavior of the tool at each run.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
may silently change the behavior of the tool at each run.
may silently change the behavior of the tool in the future.```

may silently change the behavior of the tool at each run.
- Not specifying a docker image or software requirements will result in a non-reproducible,
non-portable workflow!
- Do specify CPU and memory requirements where required
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be a separate suggestion with its own check box and example.

@swzCuroverse
Copy link
Contributor

@kinow, @mr-c should we revisit this given the new format of the user guide?

@kinow
Copy link
Member

kinow commented Oct 15, 2022

IMHO would be nice to revisit this one. I added FAIR somewhere in the user guide. Or at least I had it in the first drafts as I realized we didn't have much about FAIR-ness, reproducibility, etc, in the user guide.

And by coincidence I am writing some documentation for a project at $work that needed to include requirements for portability of workflows. I found the Cylc documentation a good starting point, so maybe it could be useful to take a look at that doc when reviewing this PR too: https://cylc.github.io/cylc-doc/stable/html/workflow-design-guide/portable-workflows.html

Cheers
Bruno

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants