Skip to content

Commit

Permalink
Changed theme to RTD and created quick start guide (#11)
Browse files Browse the repository at this point in the history
* added theme
* added examples gallery
* created quick start guide
  • Loading branch information
lilyminium authored Sep 22, 2019
1 parent 8af54c0 commit 59844b3
Show file tree
Hide file tree
Showing 14 changed files with 2,146 additions and 49 deletions.
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
*~
.vagrant
notebooks/.ipynb_checkpoints
notebooks/*.png
notebooks/*.svg
notebooks/*.pdf
notebooks/*.pdb
notebooks/*.ndx
tmp
doc/build
doc/source/.ipynb_checkpoints
**/.ipynb_checkpoints
**/.vscode
.vscode
**/.DS_Store
6 changes: 6 additions & 0 deletions doc/examples/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@

========
Examples
========

Here are some examples.
344 changes: 344 additions & 0 deletions doc/examples/analysis/pca.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,344 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Principal Component Analysis in MDAnalysis\n",
"\n",
"2019\n",
"\n",
"Author: [Lily Wang](http://minium.com.au) [(@lilyminium)](https://github.com/lilyminium)\n",
"\n",
"Inspired by the MDAnalysis PCA tutorial by [Kathleen Clark](https://becksteinlab.physics.asu.edu/people/75/kathleen-clark) [(@kaceyreidy)](https://github.com/kaceyreidy)\n",
"\n",
"In this tutorial we:\n",
"\n",
"* use PCA to analyse and visualise large macromolecular conformational changes in the enzyme adenylate kinase (AdK)\n",
"* use PCA to compare the conformational differences between ???"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Background"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Principal component analysis (PCA) is a statistical technique that decomposes a system of observations into linearly uncorrelated variables called **principal components**. These components are ordered so that the first principal component accounts for the largest variance in the data, and each following component accounts for less and less variance. PCA is often applied to molecular dynamics trajectories to extract the large-scale conformational motions or \"essential dynamics\" of a protein. The frame-by-frame conformational fluctuation can be considered a linear combination of the essential dynamics yielded by the PCA.\n",
"\n",
"In MDAnalysis, the method is as follows:\n",
"\n",
"1. Optionally align each frame in your trajectory to the first frame.\n",
"2. Construct a 3N x 3N covariance for the N atoms in your trajectory. Optionally, you can provide a mean; otherwise the covariance is to the averaged structure over the trajectory.\n",
"3. Diagonalise the covariance matrix. The eigenvectors are the principal components, and their eigenvalues are the associated variance.\n",
"4. Sort the eigenvalues so that the principal components are ordered by variance.\n",
"\n",
"<div class=\"alert alert-warning\">\n",
" \n",
"**Note**\n",
" \n",
"It should be noted that principal component analysis algorithms are deterministic, but the solutions are not unique. For example, you could easily change the sign of an eigenvector without altering the PCA. Different algorithms are likely to produce different answers, due to variations in implementation. `MDAnalysis` is likely to return different solutions to, say, `cpptraj`. \n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Large conformational changes in adenylate kinase"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In MDAnalysis, analysis modules usually need to be imported explicitly. The `pca` module contains the `PCA` class that we will use for analysis. We also import the AdK files from the MDAnalysis test suite."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "3996de84aab74b3aa903e977a8ac80b7",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"_ColormakerRegistry()"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import MDAnalysis as mda\n",
"import MDAnalysis.analysis.pca as pca\n",
"from MDAnalysisTests.datafiles import PSF, DCD\n",
"\n",
"import nglview as nv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As usual, we start off by creating a universe."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"u = mda.Universe(PSF, DCD)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unlike other analyses, `pc.PCA` can only be applied to `Universe`s. The default `PCA` arguments are:\n",
"\n",
"```python\n",
"my_pca = pca.PCA(u, select='all', align=False, mean=None, n_components=None)\n",
"```\n",
"\n",
"By default (`align=False`), your trajectory will not be aligned to any structure. If you set `align=True`, every frame will be aligned to the first frame of your trajectory, based on the atoms in your `select` string. \n",
"\n",
"As PCA is usually used to extract large-scale conformational motions, we select only the backbone atoms here."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"pc = pca.PCA(u, select=\"backbone\", align=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once you set up the class, you can run the analysis with `.run(start=None, stop=None, step=None, verbose=None)`. These allow you to specify the frames to compute the analysis over. The default arguments compute over every frame."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<MDAnalysis.analysis.pca.PCA at 0x11c506ef0>"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pc.run(verbose=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The principal components are accessible in `.p_components`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(2565, 2565)\n"
]
},
{
"data": {
"text/plain": [
"array([ 0.02725098, 0.00156086, 0.00816821, ..., -0.01783826,\n",
" 0.04746114, 0.04257271])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(pc.p_components.shape)\n",
"pc.p_components[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The variance of each principal component is in `.variance`. For example, to get the variance explained by the first principal component:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"281443.5086197605"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pc.variance[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This variance is somewhat meaningless by itself. It is much more intuitive to consider the variance of a principal component as a percentage of the total variance in the data. MDAnalysis also tracks the percentage cumulative variance in `.cumulated_variance`. As shown below, the first principal component contains 98.7% the total trajectory variance. The first three components combined account for 99.9% of the total variance."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.9873464381554058\n",
"0.999419901112709\n"
]
}
],
"source": [
"print(pc.cumulated_variance[0])\n",
"print(pc.cumulated_variance[3])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The average structure is also saved as an `AtomGroup` in `.mean_atoms`."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[16.297781 6.8397956 -7.622989 ]\n",
" [14.900139 7.062459 -7.235277 ]\n",
" [14.185768 5.8268375 -6.879689 ]\n",
" ...\n",
" [13.035071 15.354209 -3.8042812]\n",
" [13.695147 15.725297 -4.988666 ]\n",
" [12.63667 15.566869 -6.1185045]]\n"
]
}
],
"source": [
"print(pc.mean_atoms.positions)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a33209712ed34b32b969c1f8258aed91",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"NGLWidget()"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"mean_structure = mda.Merge(pc.mean_atoms)\n",
"nv.show_mdanalysis(mean_structure)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python (mdanalysis)",
"language": "python",
"name": "mdanalysis"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Binary file added doc/source/_static/.DS_Store
Binary file not shown.
Loading

0 comments on commit 59844b3

Please sign in to comment.