Skip to content

Structural analysis

Sebastian Keller edited this page Nov 22, 2021 · 3 revisions

The structural analysis of StructMAn calculates a set of structural features for individual residues that are part of a protein structure. The calculations can be divided into three parts:

Calculations related to solvent-accessible area

The solvent-accessible area of a residue is an important measure to distinguish functional roles of residues. The vast majority of solvent-accessible area calculations are performed by xssp, only for protein structures with missing atom coordinates we use SphereCon. We divide the solvent-accessible area by the total surface area of the residue to receive the relative solvent-accessible area (RSA). We further divide the RSA-values into sidechain atoms and mainchain atoms. Based on RSA-values we categorize three types of structural locations:

  • Surface (RSA >= 0.16)
  • Buried (0.16 > RSA >= 0.05)
  • Core (RSA < 0.05)

Distance calculations

Definition of shortest distance

A shortest distance between a residue and another molecule is the shortest possible distance between any atom of the residue and any atom of the molecule.


We calculate the shortest distance for each residue to all other molecules that are part of the structural data. We store the shortest distance for each type of molecule. We distinguish between:

  • Protein chains
  • DNA chains
  • RNA chains
  • Peptides
  • Low molecular-weight ligands
  • Metals
  • Non-metal ions

RIN-based calculations

RIN calculation

Residue Interaction Networks are graph representations of protein structures. We use RINerator to generate RIN datastructures.

RIN-based structural features

Similar to the distance calculations, we detect interactions of the analyzed residue to all other molecules in the RIN and distinguish between the same type of interaction partners. Further, we look into interactions of the analyzed residue to other residues of the same chain. Here, we distinguish between residues by their distance in the amino acid sequence of the corresponding protein:

  • Neighbor (sequence distance 1, the two neighboring amino acids)
  • Short (sequence distance < 6)
  • Long (sequence distance >= 6)
    For all interaction types detected in a RIN, we store:
  • interaction degree (total amount of individual interactions corresponding to one interaction type)
  • interaction score (total probe score for all corresponding interactions)
  • H-bond score
  • Overlap score (a negative score penalizing clashes of van-der-Waals spheres)

Graph centrality features

We calculate twelve different types of centrality scores to measure the strength of the connectivity of the residue inside the protein structure. We take all possible combinations of three types of normalization and four types of graph constructions. The three normalizations are:

  • None, take the absolute centrality value
  • Min-Max normalization
  • Zero-One normalization

The four types of graph constructions:

  • Protein chain only
  • Protein chain only, and subtracting the overlap clash scores
  • Whole complex
  • Whole complex, and subtracting the overlap clash scores