If using any part of this repo, please refer to:
Luisa W. Hugerth, Marcela Pereira, Yinghua Zha, Maike Seifert, Vilde Kaldhusdal, Fredrik Boulund, Maria C. Krog, , Zahra Bashir, Marica Hamsten, Emma Fransson, Henriette Svarre Nielsen, Ina Schuppe-Koistinen, and Lars Engstrand (2018) Assessment of In Vitro and In Silico Protocols for Sequence-Based Characterization of the Human Vaginal Microbiome mSphere, 5(6): e00448-20
db
16S:
-
optivag_db.aln.fasta.gz: aligned file with all 16S sequences used to simulate amplicons
-
optivag_db.fasta.gz: unaligned file with all 16S sequences used to simulate amplicons
-
optivag_seqinfo.csv: information on each of these sequences, including accession ID and taxonomy
genome_info:
-
bacteria_list.tsv: list of bacteria, needed for creating a database locally
-
updated_taxonomy.tsv: taxon names which changed since the inclusion in the database
tools
3 scripts, required for recreating the shotgun database from the files in genome_info
For instructions on how to create your local database, look here
A single script, extracts amplicons and reads of a given length, given forward and reverse primer sequences
Two scripts:
-
is_it_human.py: classifies reads in a fasta file as mapped or unmapped, given a reference file in UC format
-
make_roc_curve.py: classifies reads in one or more fastas files as correctly mapped, incorrectly mapped, correclty unmapped or incorrectly unmapped, given a reference file in UC or SAM format