An ad-hoc hierarchical clustering algorithm able to extract and rank pockets. Extraction is based on geometrical primitives generated by the NanoShaper molecular surface software. The ranking is based on Isolation Forest anomaly detector.
This script performs a hierarchical sngle-linkage clustering of "(regular) spherical probes" extracted from several calls to the NanoShaper software. NanoShaper is called externally by the script. The clustering process is detailed in paper in preparation
Free parameters
DEFAULT: alpha =0, beta=0.6, rp_max=3 (Angstroms)
alpha: How "easy" is to cluster among them probes of the same radius (larger--> wider clusters laterally. Better for large shallow sites)
beta: How "easy" is to cluster among them probes of different radius (larger--> deeper ramified custer)
rp_max: Large probe radius. The minimum is 1.4 (water molecule) and the series of clustered sphere is [1.4,..,rp_max] by increments of 0.1.
Ranking is based on Isolation Forest (IF) anomaly detector. IF is provided as a scikit-learn object previously trained and loaded from a provided binary file (in pickPocket/trainedModels)
- install patchelf:
- sudo apt get install patchelf (ubuntu)
- or see https://gist.github.com/ruario/80fefd174b3395d34c14
- The NanoShaper executable is provided bust must be linked to the libraries. To do so run the install_script within install binaries folder and follow the prompted instructions (type ./install_script).
- (Reccomended) Recompile locally the shared library. This is done by running the install_script and following the instructions (gcc required).
To run the install script just move into install binaries folder and: ./install_script (it might be necessary to change permissions: chmod +x install_script)
Using git lfs (recomended)
- git clone the folder
- install git lfs
- run: git lfs pull
Without using git lfs: download from: https://istitutoitalianotecnologia-my.sharepoint.com/:f:/g/personal/luca_gagliardi_iit_it/ErrEE6yVBGpIt_f1z43nKxkB9HZap-EtaeIFUrGzXfHRew?e=SJwohi and copy content in pickPocket/trainedModels/
contact me if the link expired (the above prevous should always work, instead)
- numpy
- scikit-learn
First check Requirements
might need to install setuptools: pip3 install -U pip setuptools
run within the folder pip3 install .
CAREFUL: In a virtual environment you might force pip to install the package in the same directory (default behavior is to copy to another location) to not miss correct pointing to libraries. If the option -e is given (develop mode) it should prevent this problem to happen.
Then the library should be available for import (see advanced use) or use it as an executable (recomended)
python3 -m pickPocket <file.pqr>
OUTPUTS:
Note: the numbering reflects the ranking.
- logfile --> contains recap of info (same printed on stdout)
- output_<pqr_name>.txt --> summary of ranked pockets (scores, subpockets etc..)
- errorLog.txt --> errors and warnings
- Folder 6gj6_Pfiles: contains:
- clusterPocket<pocket_number>.pqr --> dummy "atoms" to represent the probe spheres. Compatible with VMD.
- p<pocket_number>.off --> the triangulation of the above. Compatible with VMD.
- p<pocket_number>_atm.pqr --> the protein surface atoms belonging to the pocket envelope (Recomended for practical use). Compatible with VMD.
- Similarly for sub when subpockets are available.
- infoPocket<pocket_number>.txt --> info on residues and (pseudo) mouths with relative normals. Note: you might want to post-process it with functions.getEntrance()
- <structure_name>.vert and .face for nice triangulation in VMD of the structure. This is a "classical" NanoShaper output.
Extra set up files: config.txt and input.prm files: Samples are given in the script folder. An example of advanced scripting is provided by scripts/loop.py together with a sample structure folder containing structure-ligand pairs and a ligandMap.txt file.
In input.prm:
Action = analysis: Stores hitting statistics and features (in a binary file) over several structure-ligands pairs of all generalted pockets according to the provided clustering parameters (can loop over the parameters as well).
Action = test: Evaluate ranking power looping over structures and ligands
Config.txt is only used to overwrite default alpha, beta and maximum probe radius clustering parameters. Will be dropped in future implementations.