Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix silent parse bug #1712

Merged
merged 3 commits into from
Jan 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@
* `augur curate rename --field-map`
* `augur curate transform-strain-name --backup-fields`
* `augur curate format-dates --expected-date-formats` help text has been improved with clarifications regarding how values provided interact with builtin formats and how to match masked date parts. [#1707][], [#1718][] (@victorlin)
* parse: Transform strain names the same way in both metadata and sequences instead of only transforming sequences. [#1712][] (@huddlej)

[#1688]: https://github.com/nextstrain/augur/pull/1688
[#1690]: https://github.com/nextstrain/augur/pull/1690
[#1707]: https://github.com/nextstrain/augur/issues/1707
[#1712]: https://github.com/nextstrain/augur/pull/1712
[#1715]: https://github.com/nextstrain/augur/pull/1715
[#1716]: https://github.com/nextstrain/augur/pull/1716
[#1718]: https://github.com/nextstrain/augur/pull/1718
Expand Down
4 changes: 2 additions & 2 deletions augur/parse.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,8 +138,8 @@ def parse_sequence(
sequence_fields = map(str.strip, sequence.description.split(separator))
metadata = dict(zip(fields, sequence_fields))

tmp_name = metadata[strain_key].translate(forbidden_characters)
sequence.name = sequence.id = tmp_name
metadata[strain_key] = metadata[strain_key].translate(forbidden_characters)
sequence.name = sequence.id = metadata[strain_key]
sequence.description = ''

if prettify_fields:
Expand Down
2 changes: 1 addition & 1 deletion tests/functional/parse.t
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Parse Zika sequences into sequences and metadata using a different metadata fiel
> --fix-dates monthfirst

$ diff -u "parse/sequences_other.fasta" "$TMP/sequences.fasta"
$ diff -u "parse/metadata.tsv" "$TMP/metadata.tsv"
$ diff -u "parse/metadata_other.tsv" "$TMP/metadata.tsv"
$ rm -f "$TMP/sequences.fasta" "$TMP/metadata.tsv"

Try to parse Zika sequences with a misspelled field.
Expand Down
13 changes: 13 additions & 0 deletions tests/functional/parse/metadata_other.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
strain virus accession date region country division city db segment authors url title journal paper_url
PAN/CDC_259359( V1 )V3/2015 zika KX156774 2015-12-18 North America Panama Panama Panama genbank genome Shabman et al https://www.ncbi.nlm.nih.gov/nuccore/KX156774 Direct Submission Submitted (29-APR-2016) J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA https://www.ncbi.nlm.nih.gov/pubmed/
COL/FLR_00024/2015 zika MF574569 2015-12-XX South America Colombia Colombia Colombia genbank genome Pickett et al https://www.ncbi.nlm.nih.gov/nuccore/MF574569 Direct Submission Submitted (28-JUL-2017) J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA https://www.ncbi.nlm.nih.gov/pubmed/
PRVABC59 zika KU501215 2015-12-XX North America Puerto Rico Puerto Rico Puerto Rico genbank genome Lanciotti et al https://www.ncbi.nlm.nih.gov/nuccore/KU501215 Phylogeny of Zika Virus in Western Hemisphere, 2015 Emerging Infect. Dis. 22 (5), 933-935 (2016) https://www.ncbi.nlm.nih.gov/pubmed/27088323
COL/FLR_00008/2015 zika MF574562 2015-12-XX South America Colombia Colombia Colombia genbank genome Pickett et al https://www.ncbi.nlm.nih.gov/nuccore/MF574562 Direct Submission Submitted (28-JUL-2017) J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA https://www.ncbi.nlm.nih.gov/pubmed/
Colombia/2016/ZC204Se zika KY317939 2016-01-06 South America Colombia Colombia Colombia genbank genome Quick et al https://www.ncbi.nlm.nih.gov/nuccore/KY317939 Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples Nat Protoc 12 (6), 1261-1276 (2017) https://www.ncbi.nlm.nih.gov/pubmed/28538739
ZKC2/2016 zika KX253996 2016-02-16 Oceania American Samoa American Samoa American Samoa genbank genome Wu et al https://www.ncbi.nlm.nih.gov/nuccore/KX253996 Direct Submission Submitted (18-MAY-2016) Center for Diseases Control and Prevention of Guangdong Province; National Institute of Viral Disease Control and Prevention, China https://www.ncbi.nlm.nih.gov/pubmed/
VEN/UF_1/2016 zika KX702400 2016-03-25 South America Venezuela Venezuela Venezuela genbank genome Blohm et al https://www.ncbi.nlm.nih.gov/nuccore/KX702400 Complete Genome Sequences of Identical Zika virus Isolates in a Nursing Mother and Her Infant Genome Announc 5 (17), e00231-17 (2017) https://www.ncbi.nlm.nih.gov/pubmed/28450510
DOM/2016/BB_0059 zika KY785425 2016-04-04 North America Dominican Republic Dominican Republic Dominican Republic genbank genome Metsky et al https://www.ncbi.nlm.nih.gov/nuccore/KY785425 Zika virus evolution and spread in the Americas Nature 546 (7658), 411-415 (2017) https://www.ncbi.nlm.nih.gov/pubmed/28538734
BRA/2016/FC_6706 zika KY785433 2016-04-08 South America Brazil Brazil Brazil genbank genome Metsky et al https://www.ncbi.nlm.nih.gov/nuccore/KY785433 Zika virus evolution and spread in the Americas Nature 546 (7658), 411-415 (2017) https://www.ncbi.nlm.nih.gov/pubmed/28538734
DOM/2016/BB_0183 zika KY785420 2016-04-18 North America Dominican Republic Dominican Republic Dominican Republic genbank genome Metsky et al https://www.ncbi.nlm.nih.gov/nuccore/KY785420 Zika virus evolution and spread in the Americas Nature 546 (7658), 411-415 (2017) https://www.ncbi.nlm.nih.gov/pubmed/28538734
EcEs062_16 zika KX879603 2016-04-XX South America Ecuador Ecuador Ecuador genbank genome Marquez et al https://www.ncbi.nlm.nih.gov/nuccore/KX879603 First Complete Genome Sequences of Zika Virus Isolated from Febrile Patient Sera in Ecuador Genome Announc 5 (8), e01673-16 (2017) https://www.ncbi.nlm.nih.gov/pubmed/28232448
HND/2016/HU_ME59 zika KY785418 2016-05-13 North America Honduras Honduras Honduras genbank genome Metsky et al https://www.ncbi.nlm.nih.gov/nuccore/KY785418 Zika virus evolution and spread in the Americas Nature 546 (7658), 411-415 (2017) https://www.ncbi.nlm.nih.gov/pubmed/28538734
Loading
Loading