Exploring the known chemical space of the plant kingdom: Insights into taxonomic patterns, knowledge gaps, and bioactive regions
This repository contains code and data described in detail in our paper, "Exploring the known chemical space of the plant kingdom: Insights into taxonomic patterns, knowledge gaps, and bioactive regions" (Domingo-Fernández et al., 2023).
If you have found our manuscript useful in your work, please consider citing:
Domingo-Fernandez, D.†, Gadiya, Y.†, Mubeen, S., Healey, D., Norman, B., Colluru, V. (2023). Exploring the known chemical space of the plant kingdom: Insights into taxonomic patterns, knowledge gaps, and bioactive regions. Journal of Cheminformatics. 10.1186/s13321-023-00778-w
Install requirements
python -m venv .venv && source ./.venv/bin/activate
pip install -r requirements.txt
Run the notebooks located in the notebooks
corresponding to each analysis. The prefix of the notebooks indicates the order in which is run, which also corresponds to the Results sections of the manuscript. For detailed information about each notebook, see the README
inside the notebooks
directory.
To re-create the circular plot with heat-map, make sure to have R v4.2.2 navigate to notebooks/taxonomic_tree_viz
and run the R scripts. Please install the libraries listed at the top of the script using the command install.packages("package_name")
The manuscript is based on publictly available data from the following resources:
Datasets are publically available and can be directly downloaded from
Furthermore, the directory data
contains all the figures of the manuscript (generated by the notebooks) as well as tge raw and intermediary files (also generated by the notebooks).
- Rutz, A., Sorokina, M., Galgonek, J., Mietchen, D., Willighagen, E., Gaudry, A., ... & Allard, P. M. (2022). The LOTUS initiative for open knowledge management in natural products research. Elife, 11, e70780.
- Sorokina, M., Merseburger, P., Rajan, K., Yirik, M. A., & Steinbeck, C. (2021). COCONUT online: collection of open natural products database. Journal of Cheminformatics, 13(1), 1-13.