-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pruned FastTree graphic #250
Comments
In Ecgtheow, I am tweaking the nprune value around manually and checking the FastTrees. In CFT, a variety of nprunes may also be helpful (not just always 100), but am I right in assuming that all the ML trees would have to be pre-computed with just a handful of nprune values? |
Right now things are set up in CFT so that different reconstructions can have different prune counts, but there's no "upstream control mechanism" for which counts you might want to perform for a particular clonal family. We could do it so that when you run the build you specify the prune count, and can therefrom create a separate dataset that has been run with alternate prune counts. But this isn't really optimal for comparisons, and is sort of clunky on the build side having to switch datasets to look at different prune counts. So the better thing to do (and part of why I organized the way I did around "reconstructions" in the first place) would be to make it so that for a given clonal family and prune strategy, you can have multiple reconstructions. So maybe when you build at the command line, you can specify On the far end of the spectrum in terms of control, it might be possible to specify somehow in the input yaml file that cft consumes which clonal families should be run at what prune counts. Moving forward in this direction dovetails with getting us to the point where it would be easier for you to select clonal families from the viz and have ASR trees or ecgtheow analyses computed from them. |
@lauradoepker Regarding this |
@metasoarous I'm not sure. We haven't ever had problems with it. Some of my biggest clonal families have a couple thousand members. |
Hey @eharkins; Do you want to help with this? Would basically mean plugging something along the lines of the above into the SConstruct. |
Happy to help, sorry for the delay here. Will read more closely and let you know if I have questions tomorrow. |
@dunleavy005 @lauradoepker - @eharkins and I are taking a look at this and wondering what all went into the crazy awk step in your code snippet above. Is that something that is needed for your script to function properly or should we be able to safely ignore? |
lol I don't quite remember, BUT either one of RevBayes/BEAST was trimming off columns with full N's so to make the two programs consistent, we removed sequence positions from the FASTA that had 100% N's in the column. Unfortunately, Basically not needed for you guys :) P.S. my memory aint so bad!!! fhcrc/seqmagick#72 |
* added height calculation for evenly spaces leaves * floating point division * adding nt sequences for Olmsted#17 * github.com/matsengrp/olmsted/issues/13 * including multiplicity for Olmsted(11) * tabs -> spaces * nt seqs dict using tripl lookup instead of fasta parser * comment white space * first try on #250; using ecgtheow script to color pruned nodes * set prune_count back to default
@lauradoepker closing this, it seems to work for cft now with a few changes. Let me know if you have any issues. |
This script is choking on the bigger trees. I've tried to take steps to prevent it from becoming an issue by only having it run on the smaller trees, but for some reason this hasn't been working as I'd hoped. Hopefully I'm able to fix this, but if not, I may have to make it so that these targets wouldn't get built unless you specified a special flag (or some such). |
Yeah, it fails to render the image (or whatever) when the clonal family is 20,000 or 30,000 members big. Your fix sounds good @metasoarous. cc @dunleavy005 |
Not sure how I never came back around to close this issue, but it's now the case that if you run scons with the |
It's helpful to view this FastTree phylogeny that depicts which sequences were chosen for the
prune.py
step for lineage analyses. This would also be helpful for the minadcl pruning too. I've pasted @dunleavy005 's SCons code from Ecgtheow that achieves this - I hope I grabbed the right part of the code to show this (?). Example .png of what I'm talking about (red leaves were kept, black leaves were pruned away):The text was updated successfully, but these errors were encountered: