Information and advice relating to riboviz's memory and storage requirements.
The memory requirements of riboviz depend upon the datasets being processed.
The riboviz developers use development environments with 8-16GB RAM. This has been found to be adequate to handle the vignette - Map mRNA and ribosome protected reads to transcriptome and collect data into an HDF5 file. Multiple full size yeast datasets have been successfully processed on developers' own machines with 16GB RAM available.
One developer had issues when running the vignette as batch job on the University of Edinburgh ECDF Linux Compute Cluster) Eddie. Their job requested an 8GB node but this was terminated as the job exceeded this. On requesting a 16GB node, the job ran to success, with the system reporting that almost 14GB has been used. It is unclear at present as to why this is.
Full size yeast datasets have been successfully processed on EDDIE with 16GB RAM.
If running a workflow using deduplication, then you might encounter memory issues arising from an issue with UMI-Tools. UMI-tools has an option, --output-stats
, which calculates statistics relating to deduplication. This option can result in excessive memory usage during deduplication (for more information, see the UMI-Tools issue "excessive dedup memory usage with output-stats" #409).
The workflow sets --output-stats
by default. If you find you are having issues with memory usage then you might be able to resolve these by requesting that deduplication statistics are not calculated. You can do this by editing your YAML configuration file and setting:
dedup_stats: FALSE
We have a ticket for a future release to Profile riboviz to determine memory requirements #179. In the meantime, welcome contributions from users as to the memory requirements of your analyses, both when running the vignette and your own analyses. Please record an indication of the hardware and operating system you used, the memory you had available and the size of your input files (FASTA, GFF ans FASTQ files) (both bytes and number of lines).
The workflow generates many intermediate files and some of these may be unompressed and large, i.e. about the same size as the input files. All these files are placed in a temporary directory (dir_tmp
). The temporary directory's contents can be inspected for troubleshooting, if necessary.
The Nextflow work
directory has the originals of all temporary and output files.
For example, here is the volume of the outputs from a run of the vignette as documented in Map mRNA and ribosome protected reads to transcriptome and collect data into an HDF5 file:
Directory | MB |
---|---|
vignette/index |
9 |
vignette/tmp |
813 |
vignette/output |
3 |
work |
826 |
Total | 1651 |
Tip: We recommend you regularly delete temporary directories and work directory when you have completed your analyses to your satisfaction.