Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk space consumption with --gvcf option #48

Closed
kim-fehl opened this issue Sep 7, 2021 · 4 comments
Closed

Disk space consumption with --gvcf option #48

kim-fehl opened this issue Sep 7, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@kim-fehl
Copy link

kim-fehl commented Sep 7, 2021

After running analysis with --gvcf option on a 50 Gb BAM file containing 4 ONT runs and HG19 reference, the resulting tmp output subfolder takes 419 Gb, plus 117 Gb in the main output folder. Probably, it would make sense to remove VCF partial files after concatenating and sorting them and compress the output. For instance, a 117 Gb GVCF file takes only 8.5 Gb when bzip2-compressed. Some libraries as lbzip2 can decompress it in parallel. Perhaps you want to minimize dependencies, but disk space efficiency is also important when it comes to renting servers with fast SSDs.

547M	./tmp/full_alignment_output/candidate_bed
3.6G	./tmp/full_alignment_output
233G	./tmp/gvcf_tmp_output
117G	./tmp/merge_output
18G	./tmp/pileup_output
174M	./tmp/phase_output/phase_vcf
48G	./tmp/phase_output/phase_bam
48G	./tmp/phase_output
419G	./tmp
@aquaskyline aquaskyline added the enhancement New feature or request label Sep 7, 2021
@aquaskyline
Copy link
Member

Will come back later with a solution.

@aquaskyline
Copy link
Member

In the next release, we will 1) compress the intermediate files for GVCF output, and 2) provide an option for users to delete intermediate files immediately after no longer needed.

@aquaskyline
Copy link
Member

aquaskyline commented Sep 24, 2021

scheduled for v0.1-r7 release

@aquaskyline
Copy link
Member

v0.1-r7 released with #61

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants