Skip to content

Uni-Dock-Benchmarks contains a curated collection of datasets and benchmarking tests for evaluating the performance and accuracy of the Uni-Dock docking system. This repository is intended for use in continuous integration testing and for researchers seeking to compare docking results with established benchmarks.

License

Notifications You must be signed in to change notification settings

dptech-corp/Uni-Dock-Benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Uni-Dock-Benchmarks

The Uni-Dock-Benchmarks repository provides a comprehensive collection of datasets for benchmarking the Uni-Dock docking system's performance and accuracy.

Data

Benchmark data within the repository is categorized into two primary sections:

  • molecular_docking
  • virtual_screening

Molecular Docking Benchmarks

Under the molecular_docking directory, you will find several well-known benchmark datasets:

We performed the following preparation steps for the proteins and ligands in the datasets.

  • After obtaining the protein structures from the RCSB database based on the PDB code, we retained the crystal waters and cofactors that affect the binding mode and completed missing protein side chains and lost hydrogen atoms.
  • For ligands, we searched the RCSB database for the isomer SMILES corresponding to the PDB code and determined the correct protonation state according to the receptor pocket environment. Then, we generated 3D conformations for each ligand.

After excluding systems with failed preparation and those with large natural products or polypeptide ligands, 84 systems from Astex, 271 systems from CASF-2016 and 428 systems from PoseBuster were used as benchmarks.

The directory structure for each dataset is as follows:

<DataSetName>
├── <PDB_ID>
│   ├── <PDB_ID>_ligand_ori.sdf  # Original ligand structure in SDF format
│   ├── <PDB_ID>_ligand_prep.sdf # Prepared ligand in SDF format
│   ├── <PDB_ID>_receptor.pdb    # Receptor in PDB format
│   ├── <PDB_ID>_receptor.pdbqt  # Receptor ready for docking in PDBQT format
│   └── docking_grid.json        # Docking box configuration in JSON format

Virtual Screening Benchmarks

Under the virtual_screening directory, you will find several meticulously selected benchmark datasets:

The following table summarizes the statistics of the datasets:

Dataset PDB ID N_Actives N_Inactives N_Total
D4 5WIU 226 598 824
GBA 5LVX 286 458,205 458,491
NSP3 5RS7 65 3,515 3,580
PPARG 5Y2T 29 7,292 7,321
sigma2 7M94 228 596 824

The directory structure for each dataset is as follows:

<DataSetName>
├── docking_grid.json          # Docking box configuration in JSON format
├── <PDB_ID>_receptor.pdbqt    # Receptor ready for docking in PDBQT format
├── <PDB_ID>_receptor.pdb      # Receptor in PDB format
├── inactives.sdf              # Inactive molecules in SDF format
└── actives.sdf                # Active molecules in SDF format

ATTENTION: Since there are too many inactive molecules in GBA dataset, the inactives.sdf file exceeds the limit of Github, so please download from GBA-inactives and unzip before using it.

About

Uni-Dock-Benchmarks contains a curated collection of datasets and benchmarking tests for evaluating the performance and accuracy of the Uni-Dock docking system. This repository is intended for use in continuous integration testing and for researchers seeking to compare docking results with established benchmarks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages