-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using fastmultigather
to do contig-level gather and taxonomy assignment - a brief tutorial
#3095
Comments
Hi @ctb, Good news is that The error message as follow:
PS the content of test.sh is: the expected output is Thank you in advance, |
ok, took me a second 😆 😭 and apologies for the complicated answer. This should be resolved in the next few weeks... but for now... it's a bit of a mess. Question: are you using a rocksdb index? The current release of the plugin, v0.9.3, only supports full gather output when using This will be updated in the next release, since sourmash-bio/sourmash_plugin_branchwater#298 was merged! However, the bad news is that testing has since revealed that SO, for now, the solution is: use I'll update you here when we have fixed the problems and released a new version. Apologies, things got tricky with all our different efforts to speed things up! Related issue: |
note: as of sourmash_plugin_branchwater v0.9.5 link, the results from |
Hi all - thanks for this wonderful tool! I have been using the most recent plugin distribution from conda-forge, but it still seems to have a similar error as above and additionally discussed in the plugin issue 330
I'll add that I tried the tutorial listed above and it completed as expected. |
hi @mpgriesh sorry for taking so long to get back to you! I tried reproducing the error and couldn't - I'm curious what command you're running in And this is also a good reminder to update this issue - I'll do that tomorrow! |
No problem at all @ctb! Thanks again for such a thoughtful and useful suite of tools. I'm really excited about the functionality here My apologies for not clarifying that command before. I first ran: (sourmash) sourmash sketch dna --name-from-first inputs/*.fa -p k=51 -o genomes_k51.zip And then the command run by run_sourmash_gather.sh: (branchwater) sourmash scripts fastmultigather /mpgriesh/data/sourmash/genomes_k51.zip gtdb-rs214-k51.rocksdb/ -o /mpgriesh/data/sourmash/gather_genomes_k51.csv -k 51 -m DNA -c 16 |
thanks! and hmmmm... is it possible that you have several queries with the same name or identifier? |
The file names are all SRA experiment accessions and are unique. Here is the fastmultigather output:
This appears correct to me. I have two separate conda environments - sourmash and the branchwater_plugin. Is it possible the versions are incompatible? I used the sourmash distribution for sketch and index. |
On re-reading the example, not much needed to be changed - I just updated the text a bit. |
@mpgriesh I put your CSV in a file and ran:
so it seems to be working now 🤔 . I must admit to some confusion in figuring out how and why this didn't work in your situation, because a dive into the related issues suggests that the only thing that we fixed was in the generation of the CSV - and the CSV that you copy/pasted works fine 😭 . BUT... wait... I see Update: some of the behavior we addressed in |
How to get
This command is failing.
|
hi! You'll need to install the branchwater plugin - the easiest is via conda,
let me know if that work (or doesn't)! |
Using |
Based on docs, for taxonomic classification when I ran this command, it is not able to find any matches
Sample file: https://github.com/ChillarAnand/avilpage.com/tree/master/scripts/data
For the same sample, with kraken, this is the classification. From the above tutorial, I cannot understand how to process a single sample and do taxonomic classification. |
it looks like you did everything right! Two things to try -
|
Hi I came across this thread when searching for a solution, and I just want to comment on @ChillarAnand 's case. |
Ugh, you are completely correct - sorry about that and THANK YOU for correcting me! |
There's quite a bit of interest #2816 #3070 #3089 in using doing contig-level/long-read gather and (maybe) taxonomy assignment for contigs/long reads. Here's a short example that uses
fastmultigather
to do this.The example uses contigs from genomes and searches them against the genomes themselves. This is just as an example! You can totally replace
podar-ref-singletons.zip
with your own queries that would be from different genomes.A few notes -
fastmultigather
is part of the branchwater plugin that can be installed with conda. See docs for fastmultigather specifically.fastmultigather
when used with arocksdb
database (built withsourmash scripts index
, below) will generate a single output file with-o
. Indexing will take some time for large databases but it will be worth it ;). (ref rocksdb docs)fastgather
orfastmultigather
against a zipfile, you'll get multiple output files, but otherwise things will still work.fastmultigather
quickstart using small data sets + rocksdbhackmd for editing: https://hackmd.io/ztM-7ZJoSYahMMPde7Q5vw?view
Related issues:
multigather
documentation to be clearer, and to recommendfastmultigather
#3069sourmash multigather
for 5.0 #1614The text was updated successfully, but these errors were encountered: