Skip to content

Commit

Permalink
Merge pull request #26 from nextstrain/final-final-no-i-mean-it-25
Browse files Browse the repository at this point in the history
Final tweaks [#25]
  • Loading branch information
genehack authored Dec 9, 2024
2 parents 32afa98 + efaeb6e commit 34a0870
Show file tree
Hide file tree
Showing 7 changed files with 31 additions and 15 deletions.
1 change: 1 addition & 0 deletions ingest/defaults/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ curate:
- full_authors
- authors
- institution
- url
nextclade:
dataset_name: "nextstrain/yellow-fever/prM-E"
field_map: "defaults/nextclade_field_map.tsv"
Expand Down
19 changes: 18 additions & 1 deletion ingest/rules/curate.smk
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ rule curate:
all_geolocation_rules="data/all-geolocation-rules.tsv",
annotations=config["curate"]["annotations"],
output:
metadata="data/all_metadata.tsv",
metadata=temp("data/all_metadata_intermediate.tsv"),
sequences="results/sequences.fasta",
log:
"logs/curate.txt",
Expand Down Expand Up @@ -116,6 +116,23 @@ rule curate:
"""


rule add_genbank_url:
input:
metadata=temp("data/all_metadata_intermediate.tsv"),
output:
metadata="data/all_metadata.tsv",
log:
"logs/add_genbank_url.txt",
benchmark:
"benchmarks/add_genbank_url.txt",
shell:
r"""
csvtk mutate2 -t \
-n url \
-e '"https://www.ncbi.nlm.nih.gov/nuccore/" + $accession' \
{input.metadata:q} > {output.metadata:q} 2> {log:q}
"""

rule subset_metadata:
input:
metadata="data/all_metadata.tsv",
Expand Down
4 changes: 2 additions & 2 deletions ingest/rules/nextclade.smk
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,9 @@ rule join_metadata_and_nextclade:
(
export SUBSET_FIELDS=`grep -v '^#' {input.nextclade_field_map} | awk '{{print $1}}' | tr '\n' ',' | sed 's/,$//g'`
csvtk -tl cut -f $SUBSET_FIELDS \
csvtk -t cut -f $SUBSET_FIELDS \
{input.nextclade} \
| csvtk -tl rename2 \
| csvtk -t rename2 \
-F \
-f '*' \
-p '(.+)' \
Expand Down
7 changes: 1 addition & 6 deletions phylogenetic/defaults/auspice_config_genome.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,9 @@
},
{
"key": "clade",
"title": "Genotype (via Nextclade tree)",
"title": "Clade (via Nextclade tree)",
"type": "categorical"
},
{
"key": "date",
"title": "Date",
"type": "temporal"
},
{
"key": "region",
"title": "Region",
Expand Down
2 changes: 1 addition & 1 deletion phylogenetic/defaults/auspice_config_prM-E.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
},
{
"key": "clade",
"title": "Genotype (via Nextclade tree)",
"title": "Clade (via Nextclade tree)",
"type": "categorical"
},
{
Expand Down
2 changes: 1 addition & 1 deletion phylogenetic/defaults/color_orderings.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ region North America

################

# genotype assigned by Nextclade
# clade assigned by Nextclade
clade Clade I
clade Clade II
clade Clade III
Expand Down
11 changes: 7 additions & 4 deletions phylogenetic/defaults/description.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,20 +14,23 @@ Our bioinformatic processing workflow can be found at

- sequence alignment by [augur align][]
- phylogenetic reconstruction using [IQTREE-2][]
- clade assignment using [Nextclade][]
- ancestral state reconstruction and temporal inference using [TreeTime][]

#### Underlying data

We curate sequence data and metadata from NCBI as starting point for
our analyses.
our analyses. We gratefully acknowledge the large contribution of over
500 sequences from Hill, et al.

---

Screenshots may be used under a [CC-BY-4.0 license][] and attribution
to nextstrain.org must be provided.

[CC-BY-4.0 license]: https://creativecommons.org/licenses/by/4.0/
[github.com/nextstrain/yellow-fever]: https://github.com/nextstrain/yellow-fever
[augur align]: https://docs.nextstrain.org/projects/augur/en/stable/usage/cli/align.html
[IQTREE-2]: http://www.iqtree.org/
[Nextclade]: https://nextstrain.org/fetch/data.clades.nextstrain.org/v3/nextstrain/yellow-fever/prM-E/2024-11-05--09-19-52Z/tree.json
[TreeTime]: https://github.com/neherlab/treetime
[augur align]: https://docs.nextstrain.org/projects/augur/en/stable/usage/cli/align.html
[github.com/nextstrain/yellow-fever]: https://github.com/nextstrain/measles
[CC-BY-4.0 license]: https://creativecommons.org/licenses/by/4.0/

0 comments on commit 34a0870

Please sign in to comment.