Fix PDBQT parser for wrong H elements interpolation #957
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PDBQT files (used by the Autodock programs) are derived from PDB files notably by the addition of partial charges (Q) and specific atom types (T).
The atom types are defined in the last columns of the PDBQT file, instead of the element symbol when applicable.
To allow element detection while circumventing this, the previous code was inferring the element symbol from the first letter(s) of the atom name.
This works well but, at least on some files found on the Webina project, some H atoms have names beginning with a digit.
This leads to an unrecognized element name, which is given a default radius larger than the one expected from an hydrogen which in turn causes wrong bond detection as can be seen in the following screenshot.
This PR fixes this bug by using mappings from PDBQT atom types to element (derived from the meeko library).
A PDBQT file excerpt has been added to the test dataset to check the fix. It can also be tested using the PDB file from the webina project link cited previously.