Skip to content

Commit

Permalink
update treetime docs with jupyter notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
ktmeaton committed Jul 20, 2020
1 parent b6996a1 commit 9a63c9b
Showing 1 changed file with 17 additions and 58 deletions.
75 changes: 17 additions & 58 deletions docs/exhibit/exhibit_main.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ Code Installation
Clone Repository
^^^^^^^^^^^^^^^^

::
**Shell**::

git clone https://github.com/ktmeaton/plague-phylogeography.git
cd plague-phylogeography
conda activate plague-phylogeography-0.1.4dev

Install some accessory tools that are being tested.

::
**Shell**::

conda install geopy
conda install cutadapt
Expand All @@ -29,7 +29,7 @@ Database
Create
^^^^^^

::
**Shell**::

nextflow run ktmeaton/plague-phylogeography \
--ncbimeta_create config/ncbimeta.yaml \
Expand Down Expand Up @@ -77,7 +77,7 @@ Curate metadata with a DB Browser (SQLite). Examples of modifying the BioSampleC
Update, Annotate, Join
^^^^^^^^^^^^^^^^^^^^^^

::
**Shell**::

nextflow run ktmeaton/plague-phylogeography \
--ncbimeta_update config/ncbimeta.yaml \
Expand All @@ -95,7 +95,7 @@ Verify Samples

Select records from the database that are marked as "KEEP: Assembly".

::
**Shell**::

nextflow run ktmeaton/plague-phylogeography \
--sqlite_select_command_asm "\"SELECT AssemblyFTPGenbank FROM Master WHERE (BioSampleComment LIKE '%KEEP%Assembly%')\"" \
Expand All @@ -109,15 +109,15 @@ Select records from the database that are marked as "KEEP: Assembly".

Check that there are 475 assemblies to be downloaded.

::
**Shell**::

wc -l results/sqlite_import/assembly_for_download.txt


Run Pipeline (With Outgroup)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

::
**Shell**::

nextflow run ktmeaton/plague-phylogeography \
--outdir Assembly_Modern_Outgroup \
Expand All @@ -130,7 +130,7 @@ Run Pipeline (With Outgroup)
Run Pipeline (Without Outgroup)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

::
**Shell**::

nextflow run ktmeaton/plague-phylogeography \
--outdir Assembly_Modern \
Expand All @@ -145,11 +145,11 @@ Run Pipeline (Without Outgroup)
(latest resume id: 9112a035-a628-4f9d-8955-faa7732a1b73)

Ancient Raw Data Analysis
^^^^^^^^^^^^^^^^^^^^^^^^^
-------------------------

Prep tsv input from ktmeaton/plague-phylogeography, select only EAGER Ancient samples

::
**Shell**::

nextflow run ktmeaton/plague-phylogeography \
--outdir EAGER_Ancient \
Expand All @@ -162,7 +162,7 @@ Prep tsv input from ktmeaton/plague-phylogeography, select only EAGER Ancient sa

Download all samples, run through EAGER

::
**Shell**::

nextflow run ktmeaton/plague-phylogeography \
--outdir EAGER_Ancient \
Expand All @@ -174,7 +174,7 @@ Download all samples, run through EAGER

SAMN00715800: Split after base 75 into two separate files to maintain proper paired-end format.

::
**Shell**::

mv EAGER_Ancient/sra_download/fastq/single/${runAcc}_1.fastq.gz \
EAGER_Ancient/sra_download/fastq/single/${runAcc}_unsplit.fastq.gz;
Expand All @@ -195,55 +195,14 @@ SAMN00715800: Split after base 75 into two separate files to maintain proper pai

Remove original unsplit file

::
**Shell**::

rm EAGER_Ancient/sra_download/fastq/single/SRR341961_unsplit.fastq.gz

| Fix the metadata in the EAGER tsv input file to now be paired end, (optional: mark full UDG!
| Fix the metadata in the EAGER tsv input file to now be paired end, (optional: mark full UDG!)
| Rerun EAGER pipeline
Nextstrain
----------
Treetime
------------

Run the nextstrain and treetime section of the pipeline.

::

nextflow run ktmeaton/plague-phylogeography \
--outdir Assembly_Modern \
--sqlite_select_command_asm "\"SELECT AssemblyFTPGenbank FROM Master WHERE (BioSampleComment LIKE '%KEEP%Assembly%')\"" \
--max_datasets_assembly 500 \
--skip_sra_download \
--skip_outgroup_download \
--iqtree_branch_support \
--iqtree_outgroup GCA_000323485.1_ASM32348v1_genomic,GCA_000323845.1_ASM32384v1_genomic \
--treetime \
-resume

(latest resume id: 9112a035-a628-4f9d-8955-faa7732a1b73)

Regression Plot
^^^^^^^^^^^^^^^

**Python**::

from Bio import Phylo
outdir = "Assembly_Modern/nextstrain/treetime_clock/"
PY_88 = "GCA_000269405.1_ASM26940v1_genomic"
MG05_1020 = "GCA_000169635.1_ASM16963v1_genomic"
India195 = "GCA_000182505.1_ASM18250v1_genomic"

tree = Phylo.read(outdir + divergence_tree.nexus", "nexus")
ori_subtree = tree.common_ancestor(PY_88, MG05_1020, India195)
Phylo.write(ori_subtree, open(outdir + "ori_subtree.nwk", "w"), "newick")

**Shell Script**::

treetime clock \
--tree $project/nextstrain/treetime_clock/ori_subtree.nwk \
--dates $project/nextstrain/metadata_nextstrain_geocode_state.tsv \
--date-column BioSampleCollectionDate \
--aln $project/snippy_multi/snippy-core.full_CHROM.filter0.fasta \
--clock-filter 3 \
--keep-root \
--outdir $project/nextstrain/treetime_clock/ori_subtree/
Treetime scripts are in development as Jupyter Notebooks.

0 comments on commit 9a63c9b

Please sign in to comment.