About
Getting Started with CDPKit
Installation
Basic Usage
Further Exploration
Documentation and Resources
The official documentation of CDPKit can be found at https://cdpkit.org.
CDPKit is an open-source cheminformatics software toolkit implemented in C++. In addition to the C++ API a Python-interfacing layer is provided that allows to use all of CDPKit's functionality from Python scripts. CDPKit features a high-quality and well-tested modular implementation of basic functionality typically required by any higher-level software application in the field of cheminformatics. This includes data structures for the representation and analysis of chemical structures, routines for the I/O of small molecule and macromolecular data in different file formats (MDL Mol and SDF, Mol2, PDB, MMTF, SMILES, etc.), ring and aromaticity perception, pharmacophore generation, substructure searching, molecular fingerprinting, molecule fragment generation (RECAP, BRICS), 2D structure layout and visualization, 3D structure and conformer generation (CONFORGE), physicochemical property prediction, and so on.
At its core, CDPKit delivers a set of command line tools and software libraries (CDPL) that enable researchers to work with molecular data in a systematic and efficient manner, allowing seamless integration with other software and databases.
Furthermore, CDPKit integrates with various machine learning and data mining libraries, enabling scientists to build predictive models for molecular properties. This makes it a valuable tool in the field of computational drug discovery, where machine learning is employed to predict the biological activity, toxicity, and other properties of potential drug candidates. An example of the integration can be found in the source code of this publication.
This short guide will help you get started with CDPKit and introduce you to some of its key features.
Currently, CDPKit can be install in three ways: i) using one of the CDPKit binary installers provided for download
on the Releases page, ii) installation after building CDPKit from source and
iii) using the pip
package manager (installs CDPL Python bindings only).
More elaborate installation instructions can be found here.
Build requirements and dependencies:
- C++17 compliant compiler (mandatory)
- cmake (V >= 3.17, mandatory)
- boost-devel (V >= 1.63, mandatory)
- python-devel V3.x (optional) and Python V3.x interpreter (mandatory)
- Qt5-devel (optional)
- cairo-devel (V >= 1.14, optional)
- sphinx (V >= 4.5, optional)
- sphinx-rtd-theme (optional)
- sphinx-inline-tabs (optional)
- sphinx-sitemap (optional)
- sphinxcontrib-bibtex (optional)
- docs-versions-menu (optional)
- doxygen (V >= 1.8.5, optional)
Build and install CDPKit
The makefiles are generated using cmake
as follows (assuming a build on a Linux host):
$ mkdir <BUILD-DIR>
$ cd <BUILD-DIR>
$ cmake <CDPKIT-SOURCE-DIR>
If the makefiles have been generated without errors, invoking
make
from within <BUILD-DIR>
starts the actual build process:
$ make
Building CDPKit should proceed without any issues on current Linux systems. If the build finished without errors
$ make install
will install CDPKit in the /opt
directory of your system (a different install
location can be specified by a -DCMAKE_INSTALL_PREFIX=<INSTALL-DIR>
argument on the cmake
command line).
Generating CDPKit documentation (optional)
For a successful build of the CDPKit documentation pages, sphinx-build
and the listed Sphinx extensions need to be available on the build host.
Furthermore, for generating CDPL C++ and Python API-documentation, doxygen
has to be installed.
If all prerequisites are fulfilled
$ make doc
should successfully build the CDPKit documentation pages which can then be found in <BUILD-DIR>/Doc/html
.
Option 1: Installation of the latest stable CDPKit release deposited on PyPI:
$ pip install cdpkit
If available for your platform and Python version, this command will directly install a pre-built binary package (wheel file) of the CDPL Python bindings. If a matching binary package cannot be found, the source code package will be downloaded and an on-the-fly build is attempted. For a successful build the following requirements and dependencies apply:
- C++17 compliant compiler (mandatory)
- boost-devel (V >= 1.63, mandatory)
- python-devel and Python interpreter (V >= 3.6, mandatory)
- cairo-devel (V >= 1.14, optional)
Option 2: Build and installation of the current development version by specifying the GitHub repository URL:
$ pip install git+https://github.com/molinfo-vienna/CDPKit.git
Option 3: Installation under specification of a local directory containing the CDPKit sources:
Change your CWD to the CDPKit source code folder and then from within the folder run
$ pip install .
Once CDPKit is installed, you can start using it in your Python code (note: PYTHONPATH
has to include the <INSTALL-DIR>/Python
directory;
this is automatically the case when installed via pip
).
Here's an example to get you started with basic ligand-based pharmacophore generation starting from a SMILES string:
# Import the necessary CDPKit modules
from CDPL import Chem
from CDPL import Pharm
# read molecule in SMILES-format
mol = Chem.parseSMILES('Cc1ccccc1')
# print the number of atoms and bonds for the molecule
print('Processing molecule with {!s} atoms and {!s} bonds'.format(mol.numAtoms, mol.numBonds))
# create an instance of the pharmacophore data structure
ph4 = Pharm.BasicPharmacophore()
# prepare molecule for pharmacophore generation
Pharm.prepareForPharmacophoreGeneration(mol)
# generate the ligand based pharmacophore model for the molecule
Pharm.DefaultPharmacophoreGenerator(mol, ph4)
# print the number of features and feature composition
print(' -> Generated %s features: %s' % (str(ph4.numFeatures), Pharm.generateFeatureTypeHistogramString(ph4)))
CDPKit offers a vast range of functionality beyond the basic usage example shown above. Some areas to explore include:
-
Substructure Searching: CDPKit provides powerful methods to search for substructures within molecules.
-
Descriptor Calculation: CDPKit allows you to calculate various molecular descriptors, such as Lipinski's Rule-of-Five properties, topological fingerprints, and much more. These descriptors can be used to model molecular properties and predict biological activities.
-
Chemical Reactions: CDPKit fully supports chemical reactions with dedicated data structures and functionality for I/O, reaction substructure search, 2D visualization, reaction transformations, and so on.
-
Machine Learning Integration: CDPKit integrates well with machine learning libraries like scikit-learn, PyTorch, and TensorFlow. You can use CDPKit to preprocess molecular data, extract features, and build predictive models for various chemical properties.
To learn more about CDPKit and explore its features in detail, refer to the official documentation and additional resources:
-
CDPKit Documentation: Visit the CDPKit documentation for comprehensive information, tutorials, and examples.
-
CDPKit Cookbook: Explore the CDPKit Cookbook for a collection of code snippets and examples showcasing various CDPKit functionalities. The documentation page will be available for external usage soon.
-
CDPKit Conformer Generator: Have a look at the CONFORGE paper for detailed information about the integrated high-quality conformer generator.
-
CDPKit ML integration example: Check out the github page of the "Improved Lipophilicity and Aqueous Solubility Prediction with Composite Graph Neural Networks" publication for exploring the ML integration possibilities of CDPKit.