download and search 66,000 GTDB genomes with a query genome #13
Labels
fasta
working with FASTA files
genome
analyzing genomes
gtdb-rs207
examples using GTDB RS207
intro
introductory examples
You'll need to build the genome signature file in #11 first.
Then, download the GTDB genomic representatives database:
This will create a 1.7 GB file,
gtdb-rs207.genomic-reps.dna.k31.zip
, which contains 66,000 genome sketches from the Genome Taxonomy Database, release 207.Now search the genome against the GTDB database:
This will take about 5 minutes.
The output will look like this:
showing that this genome is, indeed, an E. coli genome :).
The similarity in the left column is Jaccard similarity, calculated using the k-mers in the query genome sketch against the k-mers in each of the database genome sketches.
You can increase the number of output results with
-n
:and you can record the results in a CSV file with
-o <output.csv>
.The text was updated successfully, but these errors were encountered: