-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trims .bam from cram files; Adds crai; Trick for G-Actions disk limit #3
base: main
Are you sure you want to change the base?
Conversation
@@ -18,3 +18,4 @@ jobs: | |||
- name: Basic workflow tests | |||
run: | | |||
nextflow run ${GITHUB_WORKSPACE} --config conf/test.config | |||
echo "Results tree view:" ; tree -a results; head results/**/*txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I add printing of the results and head, because I want to see the cram sizes. We are not able to inspect them as we choose not to store artifacts from the CI.
Additionally, we need to delete the generated data because we hit the disk size limits and the CI fails because of that, see example here:
https://github.com/lifebit-ai/bam2cram/runs/4233568183?check_suite_focus=true#step:4:201
// delete the actual files to save space in Github Actions | ||
pre_script = "df -h; ls -lh" | ||
post_script = "df -h; ls -lh > metadata.cram.txt; rm *.cram; rm *.crai" | ||
echo = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Adding
echo true
so that we can see the printing in the CI test.
input = 'testdata/test_input_cloudos.csv' | ||
reference = 's3://eu-west-1-example-data/nihr/testdata/Homo_sapiens_assembly38.fasta' | ||
report_dir = "/opt/bin" | ||
// delete the actual files to save space in Github Actions | ||
pre_script = "df -h; ls -lh" | ||
post_script = "df -h; ls -lh > metadata.cram.txt; rm *.cram; rm *.crai" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Adding a custom cleanup script after the process to capture in a txt the generated file sizes for the crams, and deleting the crams,crais after to fix the failure due to disk size limitation.
output: | ||
file "*.cram" | ||
file "*.cra*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I am changing to also catch the crai files (we need them if we want to use the crams for variant calling).
This also allows us to fish anything and send it to publishDir that contains .cra without forcing the suffix to be cram.
|
||
script: | ||
""" | ||
samtools view -T $reference -o ${bam_file}.cram -O cram,version=3.0 $bam_file | ||
${params.pre_script} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding debugging and handy script sections, to be able to debug.
We can use for example to see if we have enough space, if we are wasting too much disk size, ls the files and many more
@cgpu I would say this is ready to merge, what do you say? |
Overview
Does this
Purpose
To achieve that
Changes