Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compressed tree output as file #16

Closed
TYMichaelsen opened this issue Feb 23, 2021 · 5 comments
Closed

compressed tree output as file #16

TYMichaelsen opened this issue Feb 23, 2021 · 5 comments

Comments

@TYMichaelsen
Copy link

Hi Anna,

Really like the speed, ease-of-use, and performance of pastml! Would it be possible to provide the entire compressed tree/dendrogram, as visualised in the html's, as a textfile that can be further manipulated downstream? e.g. following some standard network file format like pajek (.net) for example? Or just two flat text files, one that indicates how the nodes are linking together + one listing the membership node for all sequences analyzed.

Such a feature would be super helpful for doing downstream associative studies of the clusters formed by pastml.

(PS: was considering to scrape it from the html itself but 1) only smaller clusters have the names of sequences within them and 2) not all data is in the html if the dataset gets too large)

best, Thomas

@annazhukova
Copy link
Contributor

Hi Thomas,

That's indeed a very good idea, thanks!
I will try to implement it (most probably next week) and let you know.

Cheers,
Anna

@TYMichaelsen
Copy link
Author

Awesome, thanks! let me know if you need some user-input, otherwise looking forward to it

Thomas

@TYMichaelsen
Copy link
Author

Hi Anna,
Any ETA on fixing this? I'd really like to be able to use these information for post-hoc analysis. I know you probably have tons of stuff to do, so sorry for being pushy.
Best,
Thomas

@annazhukova
Copy link
Contributor

Hi Thomas,

I have added a pajek .net export of the compressed tree (without horizontal compression).
The vertices are like this:
<id> "<cluster_root_id>" "tip_id_1,...,tip_id_K" "Col_1:Value" ... "Col_n:Value"

It is available in version 1.9.33 and you can run it as, for example:
singularity run docker://evolbioinfo/pastml:v1.9.33 --tree Albanian.tree.152tax.nwk --data metadata.csv --columns Country --data_sep , --html_compressed Albania.html --pajek Albania.net -v
--html_compressed needs to be specified.

Let me know if it works for you.

Anna

@TYMichaelsen
Copy link
Author

Hi Anna,
Thanks a lot! This was exactly what I was after and seems to work. I will let you know if I run into anything.
Best,
Thomas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants