- Description
- UPDATES
- Web Interface
- Output Example
- Results Screenshots
- Technologies Used and Requirements
- Installation
- Running the Application
- Docking Work Flow
- Docking Parameters explained
- Visualization and Results Generation
- Output File Organization
- Acknowledgments
This repository provides an automated docking solution for ligands and receptor proteins using AutoDock Vina and P2Rank. It supports high-throughput docking workflows and integrates seamlessly with SLURM, a workload manager for distributed computing and queue management. Additionally, the entire functionality is built into a web interface using the Streamlit framework, allowing convenient and intuitive operation from a web browser.
The program is built on open-source libraries and solutions. It implements a user account system, facilitating easy project management, handling of generated data, and seamless navigation. The installation process is comprehensive, enabling even beginner users to utilize this tool effectively. The system is developed and tested on Ubuntu 22.04 as a compute hosting server. Access via a web browser is unrestricted by the system, allowing the program to run locally on an Ubuntu machine or be configured as a server on a local network, accessible from any computer within the same LAN.
I am still working on this project and adding new features. The uploaded repository is fully functional. If you have any suggestions then feel free to contact me, or open a discussion or add a post.
If you encounter any problems with the installation or operation of the docking program, do not hesitate to contact me. I will do my best to assist you.
- Some PDB codes in the PDB database do not correspond to actual .pdb files, because only mmCIF files are available. The program, in case it cannot download a .pdb file, tries to download a .cif file. After successfully downloading it, it converts it to .pdb and passes it on for further processing and docking.
- Automatic selection of a chain with a receptor did not work due to the lack of uniformity in chain naming. Now, when PDB codes are given, the structures are retrieved, a list of chains is loaded, and the user has to select for each receptor the appropriate chain containing the receptor, or docking site, from a drop-down list. Alternatively, an input can be prepared in the form of a csv file that contains PDB codes and chain IDs. After loading the CSV file, a list of receptors with selected chains is shown, which can be modified, or accepted and passed on for further calculations.
The interface is built on the Sreamlit framework. After installation and configuration, the whole thing functions perfectly from a web browser.
In this repository, there is a downloadable file output_example.zip which contains an archive with a zipped sample output for the docking program.
- Python 3.11: Core scripting language.
- AutoDock Vina v1.2.5: Molecular docking engine.
- P2Rank v2.4.2: Binding site prediction.
- Biopython, RDKit, Open Babel, open-PyMOL: Molecular handling, visualization, and preparation tools.
- SLURM: Workload manager for distributed computing and queue management.
- Streamlit 1.40.2: Frontend web interface.
- Operating System: Ubuntu 22.04 (or other compatible Debian distributions). For advanced users, any Linux distribution can be used, but library and installation package adjustments may be necessary.
biopython
biopandas
pandas
pubchempy
tqdm
matplotlib
scipy
rdkit
pdbfixer
pymol-open-source
streamlit
bcrypt
Java Runtime Environment (JRE)
The installation process is divided into several main stages. The program is configured to operate under a specific user account and name. If you wish to modify this, locate all instances of docking_machine
in the dock_GUI.py
file and replace them with your desired username. The docking server will be set up on this account.
Create a user account named docking_machine
and assign the necessary administrative permissions to install required packages and libraries. These permissions can be revoked after installation.
sudo adduser docking_machine
sudo usermod -aG sudo docking_machine
Log into the new user account:
su - docking_machine
Alternatively, relog into this account by opening a new terminal session if you are working via SSH.
Clone the repository and move it to the preferred directory, e.g., /home/docking_machine/dock
:
git clone https://github.com/Prospero1988/AutoDock_vina_pipeline.git
mkdir ~/dock
mv ~/AutoDock_vina_pipeline/* ~/dock/
mv ~/AutoDock_vina_pipeline/.* ~/dock/ 2>/dev/null
rmdir ~/AutoDock_vina_pipeline
Alternatively, you can perform these steps manually using your operating system's graphical interface.
Set access permissions for the dock
directory:
chmod -R 755 /home/docking_machine/dock
sudo chown -R docking_machine:docking_machine /home/docking_machine/dock
Navigate to the installation directory and run the installation script. Do not start this script with sudo
to ensure that all packages and libraries are installed under the docking_machine
user account, not the root account.
cd /home/docking_machine/dock/installation
chmod +x install.sh
bash install.sh
At this stage, you can already work with the program via the command line by running the init_docking.py
Python script. However, this method is not very convenient and does not support the workload manager or the graphical interface/server setup.
Ensure the system is up to date:
sudo apt update && sudo apt upgrade -y
Install the required dependencies:
sudo apt install -y munge libmunge-dev libmunge2 build-essential slurm-wlm slurm-client
Create an authentication key for Munge:
sudo /usr/sbin/create-munge-key
Set appropriate permissions:
sudo chown -R munge: /etc/munge /var/lib/munge /var/log/munge
sudo chmod 700 /etc/munge /var/lib/munge /var/log/munge
Start and enable the Munge service:
sudo systemctl enable munge
sudo systemctl start munge
Verify Munge is working correctly:
munge -n | unmunge
Expected output: Success (0).
Add a dedicated user for SLURM:
sudo useradd -r -m -d /var/lib/slurm -s /bin/false slurm
Create necessary directories:
sudo mkdir -p /var/spool/slurmd /var/log/slurm
Set appropriate permissions:
sudo chown -R slurm: /var/spool/slurmd /var/log/slurm
sudo chmod -R 755 /var/spool/slurmd /var/log/slurm
Check the hostname:
hostname
Edit the SLURM configuration file:
sudo nano /etc/slurm/slurm.conf
Add the following minimal configuration, replacing <YOUR_CLUSTER_NAME>
with your preferred cluster name and <YOUR_HOSTNAME>
with the hostname obtained earlier:
# Basic Configuration
ClusterName=<YOUR_CLUSTER_NAME>
ControlMachine=<YOUR_HOSTNAME>
# Ports and Authentication
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
# Logging
SlurmdLogFile=/var/log/slurm/slurmd.log
SlurmctldLogFile=/var/log/slurm/slurmctld.log
# Resource Management
SlurmUser=slurm
StateSaveLocation=/var/spool/slurmd
SlurmdSpoolDir=/var/spool/slurmd
ProctrackType=proctrack/pgid
TaskPlugin=task/none
SchedulerType=sched/backfill
# Node Configuration
NodeName=localhost CPUs=16 RealMemory=64000 State=UNKNOWN
# Partition Configuration
PartitionName=main Nodes=ALL Default=YES MaxTime=INFINITE State=UP
Save the file and set appropriate permissions:
sudo chown slurm: /etc/slurm/slurm.conf
sudo chmod 644 /etc/slurm/slurm.conf
Start and enable SLURM services:
sudo systemctl enable slurmctld
sudo systemctl enable slurmd
sudo systemctl start slurmctld
sudo systemctl start slurmd
Check the status of SLURM services:
sudo systemctl status slurmctld
sudo systemctl status slurmd
Expected output: Active: active (running).
Ensure all services start automatically on boot:
sudo systemctl enable munge
sudo systemctl enable slurmctld
sudo systemctl enable slurmd
To ensure the server is always available after a system restart, configure it as a system service using systemd.
Create a streamlit_docking.service
file in the /etc/systemd/system
directory:
sudo nano /etc/systemd/system/streamlit_docking.service
Add the following configuration, replacing paths as necessary:
[Unit]
Description=Streamlit Docking GUI Service
After=network.target
[Service]
User=docking_machine
Group=docking_machine
WorkingDirectory=/home/docking_machine/dock
Environment="PATH=/home/docking_machine/miniconda/envs/auto_dock/bin"
ExecStart=/home/docking_machine/miniconda/envs/auto_dock/bin/streamlit run dock_GUI.py
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Set the correct permissions and ownership:
sudo chown root:root /etc/systemd/system/streamlit_docking.service
sudo chmod 644 /etc/systemd/system/streamlit_docking.service
Enable and start the service:
sudo systemctl daemon-reload
sudo systemctl enable streamlit_docking.service
sudo systemctl start streamlit_docking.service
Verify the service status:
sudo systemctl status streamlit_docking.service
Expected output:
● streamlit_docking.service - Streamlit Docking GUI Service
Loaded: loaded (/etc/systemd/system/streamlit_docking.service; enabled; vendor preset: enabled)
Active: active (running) since ...
Open the necessary ports in the firewall:
sudo ufw allow 8001,8501
sudo ufw status
Test the setup after a system restart:
sudo reboot
After rebooting, verify the service is running:
sudo systemctl status streamlit_docking.service
Access the application via a web browser at:
http://<YOUR_COMPUTER_IP>:8501
To access the Streamlit application using the address http://<YOUR_LAN_IP>
without specifying a port, you need to configure a reverse proxy using a web server like NGINX. The reverse proxy will handle requests on the default HTTP port (80) and forward them to Streamlit on port 8501.
sudo apt update
sudo apt install nginx
Create a configuration file for your Streamlit application:
sudo nano /etc/nginx/sites-available/streamlit
Insert the following configuration, replacing <YOUR_LAN_IP>
with your actual LAN IP address:
server {
listen 80;
server_name <YOUR_LAN_IP>;
location / {
proxy_pass http://127.0.0.1:8501; # Forward requests to Streamlit
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Handle WebSocket connections (required for Streamlit)
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
Create a symbolic link to the sites-enabled
directory:
sudo ln -s /etc/nginx/sites-available/streamlit /etc/nginx/sites-enabled/
If the default NGINX configuration is not needed, remove it:
sudo rm /etc/nginx/sites-enabled/default
Test the NGINX configuration to ensure there are no errors:
sudo nginx -t
If everything is correct, restart NGINX:
sudo systemctl restart nginx
To allow HTTP and HTTPS traffic through the firewall, use the following command:
sudo ufw allow 'Nginx Full'
Verify the firewall status:
sudo ufw status
Expected output:
To Action From
-- ------ ----
Nginx Full ALLOW Anywhere
Nginx Full (v6) ALLOW Anywhere (v6)
Open your browser and navigate to:
http://<YOUR_LAN_IP>
NGINX is designed as a system service that automatically starts during system boot. By default, after installation, NGINX is configured to start automatically. However, you can verify or manually enable this option.
Run the following command to ensure NGINX is enabled and starts on boot:
sudo systemctl is-enabled nginx
The expected output is:
enabled
If the previous command returns disabled
, you can enable NGINX:
sudo systemctl enable nginx
Check if NGINX is active:
sudo systemctl status nginx
If everything is fine, you should see a message similar to:
● nginx.service - A high performance web server and a reverse proxy server
Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
Active: active (running) since ...
By configuring NGINX as a service and enabling it, it will run automatically at every system boot without needing manual intervention. If your Streamlit server is properly running through NGINX, accessing the application via http://<YOUR_LAN_IP>
will consistently work after each restart.
To access the application on the local computer where the server is running, open your web browser and navigate to:
http://localhost:8501
Alternatively, you can use:
http://127.0.0.1:8501
To access the application from other computers within your Local Area Network (LAN):
-
Ensure NGINX is Properly Configured: As outlined in the NGINX Reverse Proxy Setup for Streamlit section, NGINX should be set up to forward requests from port 80 to Streamlit's port 8501.
-
Open Firewall Ports: Ensure that ports 80 (HTTP) and 443 (HTTPS) are open in your firewall to allow incoming traffic. This was covered in step 6 of the NGINX setup.
-
Determine Server's LAN IP Address: Find the LAN IP address of the server hosting the application by running:
hostname -I
Suppose the LAN IP is
192.168.1.100
. -
Access from Other Devices: On any device within the same LAN, open a web browser and navigate to:
http://192.168.1.100
Replace
192.168.1.100
with your server's actual LAN IP address.
The Docking Program provides the following functionalities:
-
User Authentication: Secure login and registration system with password hashing using bcrypt.
-
Project Management: Create, manage, and delete docking projects, each containing receptor proteins and ligand files.
-
Docking Submission: Configure docking parameters and submit docking jobs to the SLURM workload manager for high-throughput molecular docking.
-
Queue Monitoring: View the current SLURM job queue and monitor the status of submitted jobs.
-
Results Visualization and Download: Access interactive results generated by the docking process and download them as CSV files.
-
PyMOL Installation Guide: Access a guide for installing Open-PyMOL on Windows for molecular visualization.
-
Login/Register:
- Open the application in your web browser using either
http://localhost:8501
orhttp://<YOUR_LAN_IP>
. - If you already have an account, enter your username and password and click "Login".
- To create a new account, click "Add New User", enter your desired username and password, and click "Register".
- Open the application in your web browser using either
-
Main Menu:
- Upon successful login, you will be greeted with the main menu, where you can select from the following modules:
- DOCKING: Configure and submit docking jobs.
- QUEUE: View and manage SLURM job queue.
- SHOW RESULTS: View interactive results of docking simulations.
- DOWNLOAD RESULTS: Download docking results.
- DELETE RESULTS: Delete existing projects and results.
- PyMOL Installation GUIDE: Access the guide for installing Open-PyMOL on Windows.
- LOG OUT: Log out of the application.
- Upon successful login, you will be greeted with the main menu, where you can select from the following modules:
-
DOCKING Module:
- Project Setup: Enter a project name to create a new docking project.
- Enter PDB Codes: Provide PDB codes for receptor proteins, either by typing them in or uploading a CSV file.
- Upload Ligand Files: Upload ligand files in
.mol2
or.SDF
formats. - Docking Parameters: Configure docking parameters as needed.
- Project Summary: Review your project details before submitting the docking job.
- Start Docking: Submit the docking job to SLURM for processing.
-
QUEUE Module:
- View the current SLURM job queue, including job IDs, users, job names, states, time used, and start times.
- Refresh the queue status as needed.
-
SHOW RESULTS Module:
- Select a project and receptor to view interactive docking results.
- Access interactive results in a new browser tab.
- Download results as CSV files.
-
DOWNLOAD RESULTS Module:
- Select projects to download their results as a ZIP archive.
-
DELETE RESULTS Module:
- Select projects to delete their associated data and results.
-
PyMOL Installation GUIDE:
- Access a detailed guide for installing Open-PyMOL on Windows to visualize molecular structures.
-
LOG OUT:
- Log out of your account securely.
The Streamlit application is defined in dock_GUI.py
. Ensure that this script is properly installed and configured during the installation process (as described in the Installation section). The Streamlit server is managed as a system service, so it starts automatically on system boot.
You can also manually run the application by navigating to the installation directory and executing:
streamlit run dock_GUI.py
However, it's recommended to use the system service configuration for seamless operation.
The script accepts the following arguments:
--pdb_ids
: A CSV file located in the./receptors
directory, containing the PDB IDs of receptor proteins. Each ID corresponds to a unique protein structure available in the Protein Data Bank (PDB).--ligands
: A ligand file located in the./ligands
directory. Supported formats include SDF and MOL2 files, allowing flexibility in ligand input.
Tolerance Parameters:
--tol_x
: Tolerance in Ångströms to expand the docking pocket dimension along the X-axis beyond those defined by P2Rank (default: 0.0).--tol_y
: Tolerance in Ångströms to expand the docking pocket dimension along the Y-axis beyond those defined by P2Rank (default: 0.0).--tol_z
: Tolerance in Ångströms to expand the docking pocket dimension along the Z-axis beyond those defined by P2Rank (default: 0.0).
Offset Parameters:
--offset_x
: Offset in Ångströms to shift the center of the docking grid box along the X-axis (optional, default: 0.0).--offset_y
: Offset in Ångströms to shift the center of the docking grid box along the Y-axis (optional, default: 0.0).--offset_z
: Offset in Ångströms to shift the center of the docking grid box along the Z-axis (optional, default: 0.0).--pckt
: Pocket number to use from P2Rank predictions (default: 1).--exhaust
: Specifies how thorough the search should be for the best binding poses. Higher values increase precision but require more computation time (default: 16).--energy_range
: Determines the range of energy scores (in kcal/mol) for poses to be considered (default: 4).
Automatic Ligand Naming:
In cases where ligands in the input files lack explicit names, the script assigns them generic names in the format ligand_001
, ligand_002
, etc., ensuring consistent and organized output.
Download Receptor Structures:
For each PDB ID listed in the CSV file, the script downloads the corresponding protein structure from the Protein Data Bank (PDB). The downloaded file is saved as <PDB_ID>_dirty.pdb
in a newly created folder named after the receptor (e.g., ./8W88/
).
Fixing the Receptor:
Using PDBFixer, the script:
- Retains only the chain with the maximum number of residues.
- Removes heteroatoms and water molecules.
- Adds missing residues, atoms, and hydrogens based on a physiological pH of 7.4.
The fixed structure is saved as <PDB_ID>_fixed.pdb
.
Receptor Conversion:
The fixed PDB structure is converted to the .pdbqt
format required by AutoDock Vina. The converted file is saved as <PDB_ID>.pdbqt
.
- The script utilizes P2Rank to predict potential binding sites (pockets) on the receptor.
- The predictions are saved in a folder named
01_p2rank_output
within the receptor's directory. - A CSV file (
<PDB_ID>_predictions.csv
) lists each pocket's coordinates, size, and scores.
- The predictions are saved in a folder named
- The selected pocket (based on the
--pckt
argument) is used to define the docking box dimensions. This includes the center coordinates (center_x
,center_y
,center_z
) and sizes (size_x
,size_y
,size_z
) with optional tolerances (--tol_x
,--tol_y
,--tol_z
) for each axis. Additionally, optional offsets (--offset_x
,--offset_y
,--offset_z
) allow for independent shifting of the docking grid center along each axis.
- For each ligand in the provided SDF or MOL2 file:
- Format Handling:
- The script automatically detects the file format (SDF or MOL2) and processes accordingly.
- Conversion and Processing:
- The ligand is converted to
.pdb
format using RDKit. - Hydrogen atoms are added, and a 3D conformer is generated for the ligand.
- The
.pdb
file is converted to.pdbqt
format required for docking using Open Babel.
- The ligand is converted to
- Format Handling:
- Output Organization:
- The prepared ligand files are stored in the
02_ligands_results
subdirectory within the receptor's folder (e.g.,./8W88/02_ligands_results/
).
- The prepared ligand files are stored in the
- The script runs AutoDock Vina for each receptor-ligand pair:
- The docking box is defined using P2Rank predictions, with optional tolerances and offsets.
- Parameters such as
--exhaust
(exhaustiveness) and--energy_range
control the thoroughness and energy tolerance for pose scoring. - Docking results are saved in
.pdbqt
format, and key details (e.g., binding affinities) are extracted from the output.
- Post-Docking File Management:
- All
.pdbqt
files for ligands after docking are collectively copied into the03_ligands_PDBQT
folder, facilitating easy access without navigating through individual folders.
- All
You can run the docking without changing/defining these parameters. Default values will be used. To view the docking box, it's best to open any resulting .pse
file in PyMOL. This file contains a drawn grid box along with the coordinate axes.
--tol_x
,--tol_y
,--tol_z
: Docking box tolerances in Ångströms to expand the docking pocket dimensions along the X, Y, and Z axes respectively (default: 0.0 for each). Define how much to expand the docking pocket along each respective axis beyond the predictions made by P2Rank. Negative values indicate contraction.--offset_x
,--offset_y
,--offset_z
: Offsets in Ångströms to shift the center of the docking grid box along the X, Y, and Z axes respectively (default: 0.0 for each).
--pckt
: Pocket number from P2Rank predictions (default: 1).--exhaust
: Docking thoroughness (default: 16).--energy_range
: Energy range for docking poses (default: 4 kcal/mol).
PyMOL is used to generate visualizations of the best-docked ligand poses superimposed on the receptor structure. The visualizations include the docking grid (box) and XYZ axes for better spatial orientation. Both high-resolution images (.png
) and PyMOL session files (.pse
) are saved for each docking result. The PyMOL session files include the docking grid and XYZ axes, allowing users to explore the docking results interactively within PyMOL.
The script creates an interactive HTML report for each receptor, summarizing:
- Key docking metrics (binding energies, pocket scores).
- Links to output files (e.g.,
.pdbqt
and.txt
). - 2D and 3D visualizations of ligand-receptor complexes.
- New: An additional column with links to the PyMOL session files (
.pse
), enabling users to open and manipulate the docking results directly in PyMOL.
In addition to the HTML report, a CSV summary file is generated containing:
- Name: Ligand name.
- Affinity: Binding affinity values.
- SMILES: Simplified molecular-input line-entry system representations of ligands.
This CSV file provides a convenient overview of docking results for further analysis.
In each project directory, there are subfolders corresponding to each receptor. Within these receptor-specific subfolders, you can find the results from the docking simulations. The organization of these subfolders is as follows:
Each receptor has its dedicated directory containing:
Processed Structures:
<PDB_ID>_dirty.pdb
: Raw receptor structure.<PDB_ID>_fixed.pdb
: Cleaned receptor structure.<PDB_ID>.pdbqt
: Receptor ready for docking.
Docking Results:
02_ligands_results/
:<ligand_name>.pdbqt
: Prepared ligand.<ligand_name>.svg
: 2D ligand structure images.<PDB_ID>_<ligand_name>_docking.pse
: PyMOL session files including the docking grid and XYZ axes.
03_ligands_PDBQT/
:- All docked ligand
.pdbqt
files copied here for easy access.
- All docked ligand
Visualizations:
<PDB_ID>_<ligand_name>_docking.png
: High-resolution 3D visualizations of docked complexes.
Reports:
<PDB_ID>_results.html
: Interactive HTML report summarizing docking results, including links to PyMOL session files.<PDB_ID>_results_in_CSV.csv
: CSV file containing ligand name, affinity, and SMILES.
P2Rank Predictions:
01_p2rank_output/<PDB_ID>_predictions.csv
: Binding site information.