-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
download_from_fauna: Add --prioritized_seqs_file
option
#203
Conversation
Use the optional `prioritized_seqs_file` config param to define file path for fauna's new `--prioritized_seqs_file` option for downloading prioritized sequences by strain. Each file is expected to be prioritized sequences per segment, so this should follow the workflows current set up to provide file paths with the `{segment}` wildcard in the config.
Follow workflows current set up to provide file paths with the `{segment}` wildcard in the config. Includes skeleton files for all segments of h1n1pdm, h3n2, and vic that can be filled in as we define prioritized sequences.
Based on @huddlej's recommendation in Slack¹ DistrictOfColumbia/27/2023: - EPI_ISL_18862356 for no passage - EPI_ISL_19209054 for egg passaged Croatia/10136RV/2023: - EPI_ISL_19185072 for egg passaged The accessions used in the file are the specific sequence accessions and not the GISAID EPI ISL because the current data model does not accurately track the GISAID EPI ISL² ¹ <https://bedfordlab.slack.com/archives/C03KWDET9/p1734462580884249?thread_ts=1734438852.908179&cid=C03KWDET9> ² <nextstrain/fauna#165>
307760b
to
a4ca0f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @joverlee521! I like this approach to prioritizing the sequences with a build config option. I only had two minor blocking requests about where the priority TSVs live and inclusion of a cell-passaged Croatia accession for H3N2. Otherwise, this looks good to go.
Prioritizing EPI_ISL_19085723 sequences for least passaged Croatia/10136RV as suggested by @huddlej in review <#203 (comment)>
Set the default value for `prioritized_seqs_file` to an empty list and return the variable without conditionals as suggested by @huddlej in review <#203 (comment)>
Make these files more discoverable in the top level config as suggested by @huddlej in review <#203 (comment)>
Running the upload workflow to include the prioritized sequences in builds. |
Confirmed public builds are using the prioritized HA/NA seqs for DC/27 and Croatia/10136RV |
Description of proposed changes
Use the optional
prioritized_seqs_file
config param to define file path for fauna's new--prioritized_seqs_file
option for downloading prioritized sequences by strain. Each file is expected to be prioritized sequences per segment, so this follows the workflows current set up to provide file paths with the{segment}
wildcard in the config.Includes skeleton files for all segments of h1n1pdm, h3n2, and vic that can be filled in as we define prioritized sequences.
Depends on
--prioritized_seqs_file
option fauna#176Related issue(s)
Part of nextstrain/fauna#174
Checklist