Skip to content

Global Settings

Jouni Siren edited this page Nov 1, 2024 · 4 revisions

Verbosity

The construction writes status information to stderr. There are four possible verbosity levels, which can be set using Verbosity::set(size_type new_level):

Level Numerical Description
Verbosity::SILENT 0 No status information (default)
Verbosity::BASIC 1 Basic progress information and statistics on the input and the final index
Verbosity::EXTENDED 2 Adds intermediate statistics for each batch
Verbosity::FULL 3 Adds detailed information for each batch

Temporary files

By default, temporary files are written to the current working directory, but the directory can be changed with TempFile::setDirectory(const std::string& directory).

GBWT deletes the temporary files under normal circumstances. If the program crashes (e.g. due to invalid data or running out of memory) without calling std::exit(), some files may remain.

Haplotype generation

The haplotype generation interface stores phasing information in temporary files in order to save memory. Typical space usage is similar to the .vcf.gz file, though it depends on the number of samples per file. Because the files are run-length encoded, storing many samples per file reduces disk usage while increasing memory usage. In some cases, space usage can be several times higher. For example, run-length encoding does not help with the human chromosome X, if male and female samples are randomly interleaved.

The naming scheme for phasing files is phasing_host_process_counter (e.g. phasing_vr-4-1-14_15606_40).

Fast merging

The fast merging algorithm writes the rank array to disk in a number of temporary files. Each file contains a gap-encoded sorted subset of the rank array. The total size of these files is 2-3 bytes times the total length of the sequences in the inserted (smaller) GBWT.

The naming scheme for rank array files is ranks_host_process_counter (e.g. ranks_vr-4-1-14_15606_40).

Clone this wiki locally