-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
analyze a metagenome using 66,000 GTDB genomic representatives #14
Comments
That is pretty cool, i tried it on a metagenome (very impressed with the speed btw) but i have a couple of questions... So, based on the output below, I assume that it identified 1030126 signatures? Then out of them only 65703 were retained... Why is that? It seems awful few as 65k out of a million signatures is almost 6% ? Then when comparing to the k31 DB, 2581 gave matches, thus I assume the rest is unclassified I made the signature file by combining the two fastq files (like you showed in the previous thread) #output select query k=31 automatically.
|
Hi @sapuizait a few quick notes -
HTH! |
uggghhhhh - oh boy you are right I m such an idiot - I gave a random number for a name and I forgot about it.... sorry about that. |
no worries ;). in re threshold, it's entirely up to you! See discussion here: sourmash-bio/sourmash#2360 (comment) Note that we now have much faster multithreaded gather available, too; see benchmarks. |
Excellent - thanks! |
This example uses the metagenome signature prepared in #12.
You'll also need to download the GTDB database as in #13.
Now, run
sourmash gather
:This should take about 5 minutes.
The output should look like this:
This a minimum metagenome cover for the metagenome, based on the genomes in the GTDB database: in brief, it provides a shortest list of genomes that contain all of the known content in the metagenome (in this case, about 4%).
Note: more of the metagenome might be matched if you used a larger database or a database that included eukaryotic and/or host sequence.
The text was updated successfully, but these errors were encountered: