Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional gems, modifying scripts, and installing third party tools required #1

Closed
sabifo4 opened this issue Mar 27, 2024 · 6 comments

Comments

@sabifo4
Copy link
Contributor

sabifo4 commented Mar 27, 2024

Hi Sishuo,

First and foremost, this looks like a very promising tool, congratulations!

We (@sabifo4 and @ERRMoody) have tried your bs_inBV tool but had to make some changes to your scripts, install additional gems, and other third party tools before we could make it work. I list below those changes but, if it is easier, let me know if you want me to suggest a pull request!

  • Users need to install not only parallel and colorize, but also all the ruby gems that are required in the ruby scripts available as part of this tool:
    • parallel
    • colorize
    • find
    • getoptlong
    • time
    • fileutils
    • tmpdir
    • bio
    • bio-nwk
  • Command require_relative would not work. Consequently, we could not successfully load the functions written in the scripts available within the lib directory. To that end, we had to include $LOAD_PATH << './lib' as current line 12 in script create_hessian_by_bootstrapping.rb and replaced commands require_relative with require to load the ruby scripts in the lib directory: Dir.rb, processbar.rb, and do_mcmctree.rb (i.e., current lines 14-16). We also had to include extension rb when calling these scripts.
  • We had issues when installing newick-utils from the source code. Therefore, we searched for pre-compiled binaries and found them on this website. Specifically, we downloaded and installed Newick Utilities v1.6.
  • R is also required so, if users have not yet installed it via the terminal (i.e., WSL will need to install R via the terminal as R installed on Windows does not work), they will have to follow the instruction on the CRAN to get it installed. Command Rscript is also required. Package MASS, which is used by script reorder_node.rb, is also required.
  • We modified the path to PAML programs in line 40 in script do_mcmctree.rb. If users have an alias for MCMCtree, they will have to change line 43 accordingly. If users have not exported IQ-TREE to their PATH or have an alias different from the one specified in line 26 in script create_hessian_by_bootstrapping.rb, they will need to change that too. The same applies to lines 27-28 for newick-utils programs and line 29 for MCMCtree in that same script. Perhaps you could modify the scripts so that these paths were automatically found or add an argument for users to pass their own paths :)

Hope these suggestions help to make this tool a bit more portable!
Sandy

@evolbeginner
Copy link
Owner

Dear Sandra,

So grateful to you for your interest in this tool, your careful check, and detailed suggestions!!! Certainly a pr would be very helpful to improve the tool. Your suggestions make much sense and I'd be more than happy if u can pr. I'll also take a look and test my old code on a diff server after eastern.

the reason is likely b/c other relevant tools/packages have been made ready so that members from my previous lab (haiwei luo lab) didn't see any problems run it. i did not continue working on this and relevant projects on my new dept's server so didn't find any problem either (but i am actually thinking of a diff way to improve hessian estimation). My sincere apologies for any inconvenience. Should there be any questions/suggestions pls don't hesitate to let me know. Again, thanks much for your help!

some thoughts:

  • Command require_relative would not work. Consequently, we could not successfully load the functions written in the scripts available within the lib directory. To that end, we had to include $LOAD_PATH << './lib' as current line 12 in script create_hessian_by_bootstrapping.rb and replaced commands require_relative with require to load the ruby scripts in the lib directory: Dir.rb, processbar.rb, and do_mcmctree.rb (i.e., current lines 14-16). We also had to include extension rb when calling these scripts.

perhaps due to diff versions of ruby. I should have indicated this but the way u solved it seems quite good.

Newick Utilities v1.6.

thx for pointing this out. I heard some had issues installing nw. perhaps because I used it so often that I didn't think it is an issue.

best,
sishuo

evolbeginner added a commit that referenced this issue Apr 6, 2024
Update scripts and README.md file to deal with Issue #1
@sabifo4 sabifo4 closed this as completed Apr 7, 2024
@evolbeginner
Copy link
Owner

evolbeginner commented Apr 10, 2024

Dear Sandra,

Thanks so much for your contribution and testing the initial version of the tool! Also my apologizes for any inconvenience in using v1.0.

I have uploaded a newer version on top of your modifications. some updates are

  1. a check_dependency.rb file, which is suggested to run before using the tool. I think ruby v2.7+ is needed and the testing was successful in both v2.7 and v3.2.2.

  2. in readme file, note 3, about dealing with singular covariance matrix, is added. Interestingly, when i told mario in smbe2023 about it he told me that your paper; eq. 7 used a similar trick yrs ago. this is already included in the release v1.0.

  3. I changed in the readme file about mixture model implementation in "iqtree and raxml" to "iqtree and phyml" as i am unsure if raxml has implemented any but phyml has ex2, lg4m, etc. and actually they're also (among) the first to propose it based on my limited knowledge. I think I wrote it wrong in v1.0.

  4. for the following line, actually no need to change as that part of the script lib/do_mcmctree.rb is not used. but thx for the suggestion!
    image

  5. create_hessian_by_bootstrapping.rb line 37, now can automatically detect "iqtree2", otherwise "iqtree".

  6. acknowledgment to you in readme. thx so much!

should there be any suggestions/questions, pls feel free to let me know. thank you!

best,
sishuo

@sabifo4
Copy link
Contributor Author

sabifo4 commented Apr 12, 2024

Dear Sishuo,

Thanks for going through all the suggestions so carefully, this is fantastic!

Checking whether all Ruby dependencies are installed and automatically detecting which IQ-TREE version users have installed will help a lot! Perhaps a feature you could implement in the next release would be for bs_inBV to check for third parties that are not installed/cannot be found (e.g., IQ-TREE, PAML, Newick Utilies) and stop with a warning -- not urgent as the README.md explains what third-party tools need to be installed and how to install them, but something you may want to consider in the future :)

Regarding the issues with matrices, you are correct! We had problems with matrices built with GMM data because they normally have more characters than specimens (i.e., more columns than rows), which normally results in singular matrices that cannot be inverted. Following (Schäfer and Strimmer 2005), we decided to obtain a shrinkage estimate of the correlation matrix that could be inverted -- see function corpcor::cor.shrink in their corpcor R package, which we used to estimate the shrinkage correlation matrix in our mcmcr R package. I can see how something similar may happen when you have MLEs of branch lengths very close to 0 in the matrix. Perhaps you could test the corpcor::cov.shrink function for such cases so that people who work with closely-related taxa or experience issues with the bootstrap values despite running many iterations can benefit from using bs_inBV :)

Lastly, thanks ever so much for the acknowledgements, really appreciate it! While I made all the suggestions to clarify the content in the original README.md file (v1.0) and first tested bs_inBV v1.0 on my PC, you may also want to thank Edmund R. R. Moody. He tested v1.0 in an HPC, which helped identify the need for installing some of the Ruby gems that I had already installed on my PC and could not originally flag, while also reproducing the issues with the Newick Utilities package I experienced on a PC.

Thanks for spending time to address this GitHub issue with your latest implementations, looking forward to learning about your new way to improve Hessian estimation!

All the best,
Sandy

@evolbeginner
Copy link
Owner

evolbeginner commented Apr 20, 2024

Dear Sishuo,

Thanks for going through all the suggestions so carefully, this is fantastic!

Checking whether all Ruby dependencies are installed and automatically detecting which IQ-TREE version users have installed will help a lot! Perhaps a feature you could implement in the next release would be for bs_inBV to check for third parties that are not installed/cannot be found (e.g., IQ-TREE, PAML, Newick Utilies) and stop with a warning -- not urgent as the README.md explains what third-party tools need to be installed and how to install them, but something you may want to consider in the future :)

Regarding the issues with matrices, you are correct! We had problems with matrices built with GMM data because they normally have more characters than specimens (i.e., more columns than rows), which normally results in singular matrices that cannot be inverted. Following (Schäfer and Strimmer 2005), we decided to obtain a shrinkage estimate of the correlation matrix that could be inverted -- see function corpcor::cor.shrink in their corpcor R package, which we used to estimate the shrinkage correlation matrix in our mcmcr R package. I can see how something similar may happen when you have MLEs of branch lengths very close to 0 in the matrix. Perhaps you could test the corpcor::cov.shrink function for such cases so that people who work with closely-related taxa or experience issues with the bootstrap values despite running many iterations can benefit from using bs_inBV :)

Lastly, thanks ever so much for the acknowledgements, really appreciate it! While I made all the suggestions to clarify the content in the original README.md file (v1.0) and first tested bs_inBV v1.0 on my PC, you may also want to thank Edmund R. R. Moody. He tested v1.0 in an HPC, which helped identify the need for installing some of the Ruby gems that I had already installed on my PC and could not originally flag, while also reproducing the issues with the Newick Utilities package I experienced on a PC.

Thanks for spending time to address this GitHub issue with your latest implementations, looking forward to learning about your new way to improve Hessian estimation!

All the best, Sandy

Many thanks sandra! I have added an acknowledgment section for Edmund and you in Reademe.md https://github.com/evolbeginner/bs_inBV/blob/master/README.md#acknowledgement.

As to the "Ruby dependencies" issue, I have added the following at the very beginning of "create_hessian_by_bootstrapping.rb".

image

Also thanks for your interpretation on the shrinkage method! I will certainly delve deeper into your paper to get more details. btw, will you attend evol2024 or smbe2024? perhaps we can exchange some insights further.

@sabifo4
Copy link
Contributor Author

sabifo4 commented Apr 29, 2024

That looks great, Sishuo, thanks ever so much!! I am going to Evolution, so we can catch up this August -- I shall send you an email later.

We can now leave this issue closed and, if I find anything else, I shall open a new one. Have a good start of the week!

@evolbeginner
Copy link
Owner

That looks great, Sishuo, thanks ever so much!! I am going to Evolution, so we can catch up this August -- I shall send you an email later.

We can now leave this issue closed and, if I find anything else, I shall open a new one. Have a good start of the week!

Hi Sandra, I actually will go to smbe2024 and evol2024 virtual. will present in education session link in both. just got acc lett from both. what a pity but i believe there are many opportunities later.

Yes, any Qs or suggestions, pls don't hesitate to let me know. Thanks so much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants