Skip to content

Argenomic is a quality-diversity (or illumination) algorithm for optimization of small organic molecules.

License

Notifications You must be signed in to change notification settings

Jonas-Verhellen/Argenomic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quality-Diversity Optimisation for Molecular Design (GB-EPI)

Logo

GitHub issues GitHub DOI

Description

Argenomic is an open-source implementation of an illumination algorithm for optimization of small organic molecules. Argenomic provides a holistic overview of how high-performing molecules are distributed throughout a search space. This novel approach produces potent but qualitatively different molecules, illuminates the distribution of optimal solutions, and improves search efficiency compared to both machine learning and traditional genetic algorithm approaches. This implementation is based on an open-source, graph-based genetic algorithm for molecular optimisation, and influenced by state-of-the-art concepts from soft robot design. For more information, see the accompanying blog post.

Example of iterative illumination in a 2D representation of chemical space. In this case the fitness is determined as the molecular similarity to Troglitazone.

Getting Started

After installing the software and running the tests, a basic usage example of argenomic (i.e. the rediscovery of Troglitazone) can be called upon in the following manner:

python3 illuminate.py generations=100

Installing

Download the source code from Github to your local machine and create the environment from the environment.yml file:

conda env create -f environment.yml

Activate the new environment:

conda activate argenomic-stable

Verify that the new environment was installed correctly:

conda env list

Running the tests

To run the unit tests:

pytest ./tests

Authors

Based on the paper Illuminating elite patches of chemical space. Chemical science 11.42 (2020): 11485-11491.

  • Jonas Verhellen - Concept, implementation, and development
  • Jeriek Van den Abeele - implementation, and development

Dependencies

Important dependencies of the Argenomic software environment and where to find the source.

  • Python - Python is a widely used scientific and numeric programming language.
  • RDKit - Cheminformatics and machine-learning software toolkit.
  • Scikit-learn - Data science and deep learning toolset in Python.
  • Omegaconf - Configuration system for multiple sources, providing a consistent API.

Acknowledgments

  • Jan Jensen for his work in developing and open-sourcing a graph-based genetic algorithm for molecular optimisation, which served as impetus for this project.

  • Jean-Baptiste Mouret and Jeff Clune for their breakthrough invention of illumination algorithms, providing a holistic view of high-performing solutions throughout a search space.

  • Pat Walters for his scripts indicating how to run structural alerts using the RDKit and ChEMBL, and for his many enlightening medicinal chemistry blog posts.

Copyright License

This project is licensed under the GNU Affero General Public License v3.0 (AGPLv3).