Memory and storage

Information and advice relating to riboviz's memory and storage requirements.

Memory requirements

The memory requirements of riboviz depend upon the datasets being processed.

The riboviz developers use development environments with 8-16GB RAM. This has been found to be adequate to handle the vignette - Map mRNA and ribosome protected reads to transcriptome and collect data into an HDF5 file. Multiple full size yeast datasets have been successfully processed on developers' own machines with 16GB RAM available.

One developer had issues when running the vignette as batch job on the University of Edinburgh ECDF Linux Compute Cluster) Eddie. Their job requested an 8GB node but this was terminated as the job exceeded this. On requesting a 16GB node, the job ran to success, with the system reporting that almost 14GB has been used. It is unclear at present as to why this is.

Full size yeast datasets have been successfully processed on EDDIE with 16GB RAM.

Troubleshooting: deduplication and memory issues

If running a workflow using deduplication, then you might encounter memory issues arising from an issue with UMI-Tools. UMI-tools has an option, --output-stats, which calculates statistics relating to deduplication. This option can result in excessive memory usage during deduplication (for more information, see the UMI-Tools issue "excessive dedup memory usage with output-stats" #409).

The workflow sets --output-stats by default. If you find you are having issues with memory usage then you might be able to resolve these by requesting that deduplication statistics are not calculated. You can do this by editing your YAML configuration file and setting:

dedup_stats: FALSE

Profiling riboviz's memory requirements

We have a ticket for a future release to Profile riboviz to determine memory requirements #179. In the meantime, welcome contributions from users as to the memory requirements of your analyses, both when running the vignette and your own analyses. Please record an indication of the hardware and operating system you used, the memory you had available and the size of your input files (FASTA, GFF ans FASTQ files) (both bytes and number of lines).

Storage requirements

The workflow generates many intermediate files and some of these may be unompressed and large, i.e. about the same size as the input files. All these files are placed in a temporary directory (dir_tmp). The temporary directory's contents can be inspected for troubleshooting, if necessary.

The Nextflow work directory has the originals of all temporary and output files.

For example, here is the volume of the outputs from a run of the vignette as documented in Map mRNA and ribosome protected reads to transcriptome and collect data into an HDF5 file:

Directory	MB
`vignette/index`	9
`vignette/tmp`	813
`vignette/output`	3
`work`	826
Total	1651

Tip: We recommend you regularly delete temporary directories and work directory when you have completed your analyses to your satisfaction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory-storage.md

memory-storage.md

Memory and storage

Memory requirements

Troubleshooting: deduplication and memory issues

Profiling riboviz's memory requirements

Storage requirements

Files

memory-storage.md

Latest commit

History

memory-storage.md

File metadata and controls

Memory and storage

Memory requirements

Troubleshooting: deduplication and memory issues

Profiling riboviz's memory requirements

Storage requirements