Add pruned FastTree graphic #250

lauradoepker · 2018-09-21T15:25:05Z

It's helpful to view this FastTree phylogeny that depicts which sequences were chosen for the prune.py step for lineage analyses. This would also be helpful for the minadcl pruning too. I've pasted @dunleavy005 's SCons code from Ecgtheow that achieves this - I hope I grabbed the right part of the code to show this (?). Example .png of what I'm talking about (red leaves were kept, black leaves were pruned away):

    @w.add_target()
    def pruned_cluster_seqids(outdir, c):
        pruned_cluster_seqids = env.Command(
            path.join(outdir, "pruned_cluster_seqids.txt"),
            c["cluster_fasttree"],
            "lib/cft/bin/prune.py --naive naive --seed " + options["seed"] + " $SOURCE -n " + str(c["nprune"]["value"]) + " --strategy seed_lineage $TARGET")
        env.Depends(pruned_cluster_seqids, "lib/cft/bin/prune.py")
        return pruned_cluster_seqids

    @w.add_target()
    def pruned_cluster_seqs(outdir, c):
        return env.Command(
            path.join(outdir, "pruned_cluster_seqs.fasta"),
            [c["pruned_cluster_seqids"], c["cluster_seqs"]],
            "seqmagick convert --include-from-file $SOURCES $TARGET")

    @w.add_target()
    def input_seqs(outdir, c):
        return env.Command(
            path.join(outdir, "input_seqs.fasta"),
            c["pruned_cluster_seqs"],
            "awk \'/^[^>]/ {gsub(\"N\", \"-\", $0)} {print}\' < $SOURCE > temp.fasta;" + \
            "seqmagick mogrify --squeeze temp.fasta;" + \
            "awk \'/^[^>]/ {gsub(\"-\", \"N\", $0)} {print}\' < temp.fasta > $TARGET;" + \
            "rm temp.fasta")

    @w.add_target()
    def pruned_cluster_fasttree_png(outdir, c):
        pruned_cluster_fasttree_png = env.Command(
            path.join(outdir, "pruned_cluster_fasttree.png"),
            [c["cluster_fasttree"], c["pruned_cluster_seqids"]],
            "xvfb-run -a python/annotate_fasttree_tree.py $SOURCES --naive naive --seed " + options["seed"] + " --output-path $TARGET")
        env.Depends(pruned_cluster_fasttree_png, "python/annotate_fasttree_tree.py")
        return pruned_cluster_fasttree_png

The text was updated successfully, but these errors were encountered:

lauradoepker · 2018-09-21T15:26:58Z

In Ecgtheow, I am tweaking the nprune value around manually and checking the FastTrees. In CFT, a variety of nprunes may also be helpful (not just always 100), but am I right in assuming that all the ML trees would have to be pre-computed with just a handful of nprune values?

metasoarous · 2018-09-21T23:56:56Z

Right now things are set up in CFT so that different reconstructions can have different prune counts, but there's no "upstream control mechanism" for which counts you might want to perform for a particular clonal family.

We could do it so that when you run the build you specify the prune count, and can therefrom create a separate dataset that has been run with alternate prune counts. But this isn't really optimal for comparisons, and is sort of clunky on the build side having to switch datasets to look at different prune counts.

So the better thing to do (and part of why I organized the way I did around "reconstructions" in the first place) would be to make it so that for a given clonal family and prune strategy, you can have multiple reconstructions. So maybe when you build at the command line, you can specify --prune-counts=100,300 to compare?

On the far end of the spectrum in terms of control, it might be possible to specify somehow in the input yaml file that cft consumes which clonal families should be run at what prune counts. Moving forward in this direction dovetails with getting us to the point where it would be easier for you to select clonal families from the viz and have ASR trees or ecgtheow analyses computed from them.

metasoarous · 2018-09-23T04:19:04Z

@lauradoepker Regarding this annotate_fasttree_tree.py script, we should be able to move that into the pipeline easily enough. Just out of curiosity though, how well does this script handle for big trees?

lauradoepker · 2018-09-25T17:14:48Z

@metasoarous I'm not sure. We haven't ever had problems with it. Some of my biggest clonal families have a couple thousand members.

metasoarous · 2018-09-27T06:19:22Z

Hey @eharkins; Do you want to help with this? Would basically mean plugging something along the lines of the above into the SConstruct.

eharkins · 2018-09-28T03:58:43Z

Happy to help, sorry for the delay here. Will read more closely and let you know if I have questions tomorrow.

metasoarous · 2018-09-28T22:44:47Z

@dunleavy005 @lauradoepker - @eharkins and I are taking a look at this and wondering what all went into the crazy awk step in your code snippet above. Is that something that is needed for your script to function properly or should we be able to safely ignore?

dunleavy005 · 2018-09-28T22:50:03Z

lol I don't quite remember, BUT either one of RevBayes/BEAST was trimming off columns with full N's so to make the two programs consistent, we removed sequence positions from the FASTA that had 100% N's in the column. Unfortunately, seqmagick --squeeze only trims off -'s, so the awk nonsense is just converting everything to dashes, trimming, then back to N's.

Basically not needed for you guys :)

P.S. my memory aint so bad!!! fhcrc/seqmagick#72

* added height calculation for evenly spaces leaves * floating point division * adding nt sequences for Olmsted#17 * github.com/matsengrp/olmsted/issues/13 * including multiplicity for Olmsted(11) * tabs -> spaces * nt seqs dict using tripl lookup instead of fasta parser * comment white space * first try on #250; using ecgtheow script to color pruned nodes * set prune_count back to default

eharkins · 2018-10-08T20:28:31Z

@lauradoepker closing this, it seems to work for cft now with a few changes. Let me know if you have any issues.

metasoarous · 2018-10-25T08:58:13Z

This script is choking on the bigger trees. I've tried to take steps to prevent it from becoming an issue by only having it run on the smaller trees, but for some reason this hasn't been working as I'd hoped. Hopefully I'm able to fix this, but if not, I may have to make it so that these targets wouldn't get built unless you specified a special flag (or some such).

lauradoepker · 2018-10-29T17:15:19Z

Yeah, it fails to render the image (or whatever) when the clonal family is 20,000 or 30,000 members big. Your fix sounds good @metasoarous. cc @dunleavy005

metasoarous · 2019-02-16T08:10:52Z

Not sure how I never came back around to close this issue, but it's now the case that if you run scons with the --fasttree-png flag, it should build out the fasttree images when it can.

eharkins added a commit to eharkins/cft that referenced this issue Oct 5, 2018

first try on matsengrp#250; using ecgtheow script to color pruned nodes

153360b

eharkins closed this as completed Oct 8, 2018

metasoarous reopened this Oct 25, 2018

metasoarous closed this as completed Feb 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pruned FastTree graphic #250

Add pruned FastTree graphic #250

lauradoepker commented Sep 21, 2018

lauradoepker commented Sep 21, 2018

metasoarous commented Sep 21, 2018

metasoarous commented Sep 23, 2018

lauradoepker commented Sep 25, 2018

metasoarous commented Sep 27, 2018

eharkins commented Sep 28, 2018

metasoarous commented Sep 28, 2018

dunleavy005 commented Sep 28, 2018 •

edited

Loading

eharkins commented Oct 8, 2018

metasoarous commented Oct 25, 2018

lauradoepker commented Oct 29, 2018

metasoarous commented Feb 16, 2019

Add pruned FastTree graphic #250

Add pruned FastTree graphic #250

Comments

lauradoepker commented Sep 21, 2018

lauradoepker commented Sep 21, 2018

metasoarous commented Sep 21, 2018

metasoarous commented Sep 23, 2018

lauradoepker commented Sep 25, 2018

metasoarous commented Sep 27, 2018

eharkins commented Sep 28, 2018

metasoarous commented Sep 28, 2018

dunleavy005 commented Sep 28, 2018 • edited Loading

eharkins commented Oct 8, 2018

metasoarous commented Oct 25, 2018

lauradoepker commented Oct 29, 2018

metasoarous commented Feb 16, 2019

dunleavy005 commented Sep 28, 2018 •

edited

Loading