Contigs for all SRA accessions were constructed directly from the Logan Unitigs. Unitigs were given to Minia3, which performed de Bruijn graph simplifications inspired by SPAdes with just k=31. In total 26.7 million accessions were processed.
Contigs are stored at the following location:
s3://logan-pub/c/[accession]/[accession].contigs.fa.zst
Careful, this S3 bucket is huge. The total size of all unitigs is 385 terabytes compressed. It contains 26.7M files. Just listing the folder will take half an hour.
To download one accession, type:
wget https://s3.amazonaws.com/logan-pub/c/[accession]/[accession].contigs.fa.zst
e.g. for accession SRR11905265, type:
wget https://s3.amazonaws.com/logan-pub/c/SRR11905265/SRR11905265.contigs.fa.zst
Faster downloads are using the AWS CLI (you do not need an AWS account), type:
aws s3 cp s3://logan-pub/c/[accession]/[accession].contigs.fa.zst . --no-sign-request
To decompress a single unitigs file, type:
zstd -d [accession].contigs.fa.zst
Same headers metadata as in the unitigs, see there for an explanation of:
>[accession]_[counter] ka:f:[abundance] L:i:[..]
Contigs do not enjoy the same theoretical guarantees as the unitigs. Except that, any 31-mer present in the contigs is guaranteed to also appear in the reads. Abundances are reported in the same way as in unitigs.
To recover the .gfa
assembly graph, follow the same procedure as in the Unitigs page.