Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download_from_fauna: Add --prioritized_seqs_file option #203

Merged
merged 6 commits into from
Jan 27, 2025

Conversation

joverlee521
Copy link
Contributor

@joverlee521 joverlee521 commented Jan 15, 2025

Description of proposed changes

Use the optional prioritized_seqs_file config param to define file path for fauna's new --prioritized_seqs_file option for downloading prioritized sequences by strain. Each file is expected to be prioritized sequences per segment, so this follows the workflows current set up to provide file paths with the {segment} wildcard in the config.

Includes skeleton files for all segments of h1n1pdm, h3n2, and vic that can be filled in as we define prioritized sequences.

Depends on

Related issue(s)

Part of nextstrain/fauna#174

Checklist

  • Checks pass

Use the optional `prioritized_seqs_file` config param to define file
path for fauna's new `--prioritized_seqs_file` option for downloading
prioritized sequences by strain. Each file is expected to be prioritized
sequences per segment, so this should follow the workflows current set
up to provide file paths with the `{segment}` wildcard in the config.
Follow workflows current set up to provide file paths with the
`{segment}` wildcard in the config.

Includes skeleton files for all segments of h1n1pdm, h3n2, and vic
that can be filled in as we define prioritized sequences.
Based on @huddlej's recommendation in Slack¹

DistrictOfColumbia/27/2023:
- EPI_ISL_18862356 for no passage
- EPI_ISL_19209054 for egg passaged

Croatia/10136RV/2023:
- EPI_ISL_19185072 for egg passaged

The accessions used in the file are the specific sequence accessions
and not the GISAID EPI ISL because the current data model does not
accurately track the GISAID EPI ISL²

¹ <https://bedfordlab.slack.com/archives/C03KWDET9/p1734462580884249?thread_ts=1734438852.908179&cid=C03KWDET9>
² <nextstrain/fauna#165>
@joverlee521 joverlee521 force-pushed the fauna-prioritized-seqs branch from 307760b to a4ca0f0 Compare January 22, 2025 19:09
Copy link
Contributor

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @joverlee521! I like this approach to prioritizing the sequences with a build config option. I only had two minor blocking requests about where the priority TSVs live and inclusion of a cell-passaged Croatia accession for H3N2. Otherwise, this looks good to go.

profiles/upload/h3n2/ha/prioritized_seqs_file.tsv Outdated Show resolved Hide resolved
profiles/upload.yaml Outdated Show resolved Hide resolved
workflow/snakemake_rules/download_from_fauna.smk Outdated Show resolved Hide resolved
Prioritizing EPI_ISL_19085723 sequences for least passaged
Croatia/10136RV as suggested by @huddlej in review

<#203 (comment)>
Set the default value for `prioritized_seqs_file` to an empty list and
return the variable without conditionals as suggested by @huddlej in
review

<#203 (comment)>
Make these files more discoverable in the top level config as suggested
by @huddlej in review

<#203 (comment)>
@joverlee521 joverlee521 marked this pull request as ready for review January 27, 2025 19:15
@joverlee521 joverlee521 merged commit 9c918a5 into master Jan 27, 2025
3 checks passed
@joverlee521 joverlee521 deleted the fauna-prioritized-seqs branch January 27, 2025 19:19
@joverlee521
Copy link
Contributor Author

Running the upload workflow to include the prioritized sequences in builds.

@joverlee521
Copy link
Contributor Author

Confirmed public builds are using the prioritized HA/NA seqs for DC/27 and Croatia/10136RV
Tangletree with previous build shows very slight difference for Croatia/10136RV.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants