Skip to content

kowallus/PgRC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PgRC: Pseudogenome based Read Compressor

GitHub downloads Bioconda downloads

Pseudogenome-based Read Compressor (PgRC) is an in-memory algorithm for compressing the DNA stream of FASTQ datasets, based on the idea of building an approximation of the shortest common superstring over high-quality reads.

The implementation supports constant-length reads limited to 255 bases.

Installation on Linux - manual build

The following steps create an PgRC executable. On Linux PgRC build requires installed cmake version >= 3.5 (check using cmake --version):

git clone https://github.com/kowallus/PgRC.git
cd PgRC
mkdir build
cd build
cmake ..
make PgRC

Basic usage

PgRC [-i <seqSrcFile> [<pairSrcFile>]] [-t <noOfThreads>] [-o] [-d] <archiveName>
   
   -o preserve original read order information
   -t number of threads used
   -d decompression mode

compression of DNA stream in order non-preserving regime (SE mode):

./PgRC -i in.fastq comp.pgrc

compression of DNA stream in order preserving regime (SE_ORD mode):

./PgRC -o -i in.fastq comp.pgrc

compression of paired-end DNA stream in order non-preserving regime (PE mode):

./PgRC -i in1.fastq in2.fastq comp.pgrc

compression of paired-end DNA stream in order preserving regime (PE mode):

./PgRC -o -i in1.fastq in2.fastq comp.pgrc

decompression of DNA stream to the current folder:

./PgRC -d comp.pgrc

Publications

Tomasz M. Kowalski, Szymon Grabowski: PgRC: pseudogenome-based read compressor. Bioinformatics, Volume 36, Issue 7, pp. 2082–2089 (2020).

supplementary data

bioRxiv

Related projects

PgSA - Pseudogenome Suffix Array