Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PDBQT supplier #99

Merged
merged 12 commits into from
Nov 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,13 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest]
python-version: [3.8, 3.9, "3.10"]
rdkit-version: ["rdkit>=2020.09.1"]
python-version: [3.9, "3.10"]
rdkit-version: ["rdkit>2021.03.1"]
include:
- name: Min version
os: ubuntu-latest
python-version: 3.8
rdkit-version: "rdkit=2020.03.1 boost-cpp=1.72.0=h359cf19_6"
rdkit-version: "rdkit=2021.03.1"

steps:
- uses: actions/checkout@v2
Expand Down Expand Up @@ -70,9 +70,9 @@ jobs:
pytest --color=yes --disable-pytest-warnings --cov=prolif --cov-report=xml tests/

- name: Measure tests coverage
uses: codecov/codecov-action@v1
uses: codecov/codecov-action@v3
with:
file: coverage.xml
files: ./coverage.xml
fail_ci_if_error: True
verbose: True

Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Metal ligand: exclude amides and some amines.
- The Pi stacking interactions have been changed for a more accurate implementation
(PR #97).
- The `pdbqt_supplier` will not add explicit hydrogen atoms anymore, to avoid detecting
hydrogen bonds with "random" hydrogens that weren't in the PDBQT file.
- When using the `pdbqt_supplier`, irrelevant warnings and logs have been disabled.
- Updated the minimal RDKit version to `2021.03.1`

### Fixed
- Dead link in the quickstart notebook for the MDAnalysis quickstart (PR #75, @radifar).
- The `pdbqt_supplier` now correctly preserves hydrogens from the input PDBQT file.

## [1.0.0] - 2022-06-07
### Added
Expand Down
11 changes: 8 additions & 3 deletions docs/notebooks/how-to.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -604,9 +604,7 @@
"Next, we'll use the PDBQT supplier which loads each file from a list of paths, and assigns bond orders and charges using the template. The template and PDBQT file must have the exact same atoms, **even hydrogens**, otherwise no match will be found. Since PDBQT files partially keep the hydrogen atoms, we have the choice between:\n",
"\n",
"* Manually selecting where to add the hydrogens on the template, do the matching, then add the remaining hydrogens (not covered here)\n",
"* Or just remove the hydrogens from the PDBQT file, do the matching, then add all hydrogens.\n",
"\n",
"This last option will delete the coordinates of your hydrogens atoms and replace them by the ones generated by RDKit, but unless you're working with an exotic system this should be fine.\n",
"* Or just remove the hydrogens from the PDBQT file, do the matching, then re-add the original hydrogens.\n",
"\n",
"For the protein, there's usually no need to load the PDBQT that was used by Vina. The original file that was used to generate the PDBQT can be used directly, but **it must contain explicit hydrogen atoms**:"
]
Expand All @@ -626,6 +624,13 @@
"df = fp.to_dataframe()\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down
2 changes: 1 addition & 1 deletion docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Installation
Requirements:

* Python 3.8+
* `RDKit <https://www.rdkit.org/docs/>`_ (2020.03+)
* `RDKit <https://www.rdkit.org/docs/>`_ (2021.03+)
* `MDAnalysis <https://www.mdanalysis.org/>`_ (2.2+)
* `Pandas <https://pandas.pydata.org/>`_ (1.0+)
* `NumPy <https://numpy.org/>`_
Expand Down
67 changes: 58 additions & 9 deletions prolif/molecule.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
=======================================================
"""
import copy
from collections import defaultdict
from collections.abc import Sequence
from operator import attrgetter

Expand All @@ -12,7 +13,7 @@

from .rdkitmol import BaseRDKitMol
from .residue import Residue, ResidueGroup
from .utils import split_mol_by_residues
from .utils import catch_rdkit_logs, catch_warning, split_mol_by_residues


class Molecule(BaseRDKitMol):
Expand Down Expand Up @@ -173,7 +174,7 @@ def n_residues(self):


class pdbqt_supplier(Sequence):
"""Supplies molecules, given a path to PDBQT files
"""Supplies molecules, given paths to PDBQT files

Parameters
----------
Expand Down Expand Up @@ -210,6 +211,15 @@ class pdbqt_supplier(Sequence):
.. versionchanged:: 1.0.0
Molecule suppliers are now sequences that can be reused, indexed,
and can return their length, instead of single-use generators.

.. versionchanged:: 1.1.0
Because the PDBQT supplier needs to strip hydrogen atoms before
assigning bond orders from the template, it used to replace them
entirely with hydrogens containing new coordinates. It now directly
uses the hydrogen atoms present in the file and won't add explicit
ones anymore, to prevent the fingerprint from detecting hydrogen bonds
with "random" hydrogen atoms.
A lot of irrelevant warnings and logs have been disabled as well.

"""
def __init__(self, paths, template, converter_kwargs=None, **kwargs):
Expand All @@ -229,22 +239,61 @@ def __getitem__(self, index):
return self.pdbqt_to_mol(pdbqt_path)

def pdbqt_to_mol(self, pdbqt_path):
pdbqt = mda.Universe(pdbqt_path)
with catch_warning(message=r"^Failed to guess the mass"):
pdbqt = mda.Universe(pdbqt_path)
# set attributes needed by the converter
elements = [mda.topology.guessers.guess_atom_element(x)
for x in pdbqt.atoms.names]
pdbqt.add_TopologyAttr("elements", elements)
pdbqt.add_TopologyAttr("chainIDs", pdbqt.atoms.segids)
pdbqt.atoms.types = pdbqt.atoms.elements
# convert without infering bond orders and charges
mol = pdbqt.atoms.convert_to.rdkit(NoImplicit=False,
**self.converter_kwargs)
# assign BO from template then add hydrogens
mol = Chem.RemoveHs(mol, sanitize=False)
mol = AssignBondOrdersFromTemplate(self.template, mol)
mol = Chem.AddHs(mol, addCoords=True, addResidueInfo=True)
with catch_rdkit_logs(), catch_warning(
message=r"^(Could not sanitize molecule)|"
r"(No `bonds` attribute in this AtomGroup)"
):
pdbqt_mol = pdbqt.atoms.convert_to.rdkit(
NoImplicit=False, **self.converter_kwargs
)
mol = self._adjust_hydrogens(self.template, pdbqt_mol)
return Molecule.from_rdkit(mol, **self._kwargs)

@staticmethod
def _adjust_hydrogens(template, pdbqt_mol):
# remove explicit hydrogens and assign BO from template
pdbqt_noH = Chem.RemoveAllHs(pdbqt_mol, sanitize=False)
with catch_rdkit_logs():
mol = AssignBondOrdersFromTemplate(template, pdbqt_noH)
# mapping between pdbindex of atom bearing H --> H atom(s)
atoms_with_hydrogens = defaultdict(list)
for atom in pdbqt_mol.GetAtoms():
if atom.GetAtomicNum() == 1:
atoms_with_hydrogens[
atom.GetNeighbors()[0].GetIntProp("_MDAnalysis_index")
].append(atom)
# mapping between atom that should be bearing a H in RWMol and corresponding H(s)
reverse_mapping = {}
for atom in mol.GetAtoms():
if (idx := atom.GetIntProp("_MDAnalysis_index")) in atoms_with_hydrogens:
reverse_mapping[atom.GetIdx()] = atoms_with_hydrogens[idx]
atom.SetNumExplicitHs(0)
# add missing Hs
pdb_conf = pdbqt_mol.GetConformer()
rwmol = Chem.RWMol(mol)
mol_conf = rwmol.GetConformer()
for atom_idx, hydrogens in reverse_mapping.items():
for hydrogen in hydrogens:
h_idx = rwmol.AddAtom(hydrogen)
xyz = pdb_conf.GetAtomPosition(hydrogen.GetIdx())
mol_conf.SetAtomPosition(h_idx, xyz)
rwmol.AddBond(atom_idx, h_idx, Chem.BondType.SINGLE)
mol = rwmol.GetMol()
# sanitize
mol.UpdatePropertyCache()
Chem.SanitizeMol(mol)
# mol = Chem.AddHs(mol, addCoords=True, addResidueInfo=False)
return mol

def __len__(self):
return len(self.paths)

Expand Down
24 changes: 24 additions & 0 deletions prolif/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,18 @@
Helper functions --- :mod:`prolif.utils`
========================================
"""
import warnings
from collections import defaultdict
from collections.abc import Iterable
from contextlib import contextmanager
from copy import deepcopy
from functools import wraps
from importlib.util import find_spec
from math import pi

import numpy as np
import pandas as pd
from rdkit import rdBase
from rdkit.Chem import FragmentOnBonds, GetMolFrags, SplitMolByPDBResidues
from rdkit.DataStructs import ExplicitBitVect
from rdkit.Geometry import Point3D
Expand All @@ -34,6 +37,27 @@ def wrapper(*args, **kwargs):
return inner


@contextmanager
def catch_rdkit_logs():
log_status = rdBase.LogStatus()
rdBase.DisableLog("rdApp.*")
yield
log_status = {st.split(":")[0]: st.split(":")[1] for st in log_status.split("\n")}
log_status = {k: True if v == "enabled" else False for k, v in log_status.items()}
for k, v in log_status.items():
if v is True:
rdBase.EnableLog(k)
else:
rdBase.DisableLog(k)


@contextmanager
def catch_warning(**kwargs):
with warnings.catch_warnings():
warnings.filterwarnings("ignore", **kwargs)
yield


def get_centroid(coordinates):
"""Centroid for an array of XYZ coordinates"""
return np.mean(coordinates, axis=0)
Expand Down
40 changes: 32 additions & 8 deletions tests/test_molecule.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import pytest
from copy import deepcopy
from MDAnalysis import SelectionError
from numpy.testing import assert_array_equal
from prolif.datafiles import datapath
Expand Down Expand Up @@ -79,14 +80,11 @@ def test_monomer_info(self, suppl):
resid = ResidueId.from_atom(mol.GetAtomWithIdx(0))
assert resid == self.resid

@pytest.mark.parametrize("index", [0, 2, 8, -1])
def test_index(self, suppl, index):
index %= 9
mol_i = suppl[index]
for i, mol in enumerate(suppl):
if i == index:
break
assert_array_equal(mol.xyz, mol_i.xyz)
def test_index(self, suppl):
mols = list(suppl)
for index in [0, 2, 8, -1]:
mol_i = suppl[index]
assert_array_equal(mols[index].xyz, mol_i.xyz)


class TestPDBQTSupplier(SupplierBase):
Expand All @@ -101,6 +99,32 @@ def suppl(self):
"(c34)CC21")
return pdbqt_supplier(pdbqts, template)

def test_pdbqt_hydrogens_stay_in_mol(self):
template = Chem.RemoveHs(ligand_rdkit)
indices = []
rwmol = Chem.RWMol(ligand_rdkit)
rwmol.BeginBatchEdit()
for atom in rwmol.GetAtoms():
idx = atom.GetIdx()
atom.SetIntProp("_MDAnalysis_index", idx)
if atom.GetAtomicNum() == 1:
if idx % 2:
indices.append(idx)
else:
neighbor = atom.GetNeighbors()[0]
rwmol.RemoveBond(idx, neighbor.GetIdx())
rwmol.RemoveAtom(idx)
neighbor.SetNumExplicitHs(1)
rwmol.CommitBatchEdit()
pdbqt_mol = rwmol.GetMol()
mol = pdbqt_supplier._adjust_hydrogens(template, pdbqt_mol)
hydrogens = [
idx for atom in mol.GetAtoms()
if atom.HasProp("_MDAnalysis_index")
and (idx := atom.GetIntProp("_MDAnalysis_index")) in indices
]
assert hydrogens == indices


class TestSDFSupplier(SupplierBase):
@pytest.fixture
Expand Down