-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download subset of genome sequences for selected tree nodes #1110
Comments
This feature could be made part of auspice and made an "opt-out" extension (or opt-in) so that different implementations can choose whether or not to expose it. I know others have asked for it, so I think it would get used. Happy for someone to implement it, but it's not something we (nextstrain.org) can pursue currently. Just thinking about it briefly, it would involve a new API call to fetch the sequences, subset them, and download them. Or you could post the subsetted strain list and ask for a matching sequences file from the server. There would be memory/speed considerations here as sequence data can be very large, comes in different formats (VCF, fasta) etcetera. I don't think making a new JSON sequence format would be recommended. Currently one can download a metadata TSV subsetted appropriately, which you could then use to get the sequences you want via a script (or a different web API etc). I appreciate that it may be nicer to do it all within auspice, but there may be easier short-term solutions.
We used to rely on this to extract mutations to display genotypes as I remember (it was >2 years ago). It's on the horizon for us to implement fetching of one (ancestral) sequence which we need to colour the tree by a position which has no observed mutations. It will probably be in |
Thanks for the comments.
Sounds like something worth pursuing. I can probably arrange some time to do this, depending on task priority in the COVID19 project I am working with.
I already wrote a bash script to do the extraction. It's actually wrote a short script to do that, but I agree, |
@jameshadfield I made some progress hacking the feature. Need you comments and guidance:
|
Bumping this feature request. If we have I've encountered multiple people now where even just having Having this option makes complete sense for NCBI analyses like https://nextstrain.org/rabies or https://nextstrain.org/oropouche. We may need to give an opt-in / opt-out option however as authors of datasets like https://nextstrain.org/groups/inrb-mpox/clade-I or https://nextstrain.org/community/inrb-drc/ebola-nord-kivu may prefer users to download data through GitHub, etc... |
Context
Similar to download a subset of metadata, we want to use this to extract a subset of genome sequences for further analysis. This might not be helpful or allowed in the global nextstrain/auspice instance, but for our local one, it is legal and useful feature to have.
Description
This feature should work almost exactly like extract subset of metadata.
Possible solution
augur export
need to export and include genome sequences upon user choice (i.e. an--export-genomes
flag)datasetDir
.auspice
can find files endswith-sequences.json
it present the download subset of sequences button in theDownload Data
popup window.An observation: auspice removed handling of
sequences.JSON
at version 1.8.0. Probably this feature is related to the code of handling sequences.The text was updated successfully, but these errors were encountered: