To mark the 5-year anniversary of the knot type classification project, we release this public repository to provide the training code, best model with weights, and two showcases of generalizability that all run out-of-the-box in a GPU-enabled docker container.
The work was published in the Physical Review E journal in Febuary 2020 as a research article titled "Identifying knot types of polymer conformations by machine learning":
@article{PhysRevE.101.022502,
title = {Identifying knot types of polymer conformations by machine learning},
author = {Vandans, Olafs and Yang, Kaiyuan and Wu, Zhongtao and Dai, Liang},
journal = {Phys. Rev. E},
volume = {101},
issue = {2},
pages = {022502},
numpages = {10},
year = {2020},
month = {Feb},
publisher = {American Physical Society},
doi = {10.1103/PhysRevE.101.022502},
url = {https://link.aps.org/doi/10.1103/PhysRevE.101.022502}
}
This work, featured as the APS Editors' Suggestion, represented one of the first successful attempts of using deep learning to classify different knot types. It has attracted a number of media coverage since.
- 🥨 featured in Nature's Research Highlight: "A neural network unpicks the knots" on 21 February 2020
- 🥨 featured in APS Physics Interview: "Neural Networks Know Their Knots" on 21 February 2020
- 🥨 featured in City University of Hong Kong Research Stories: "CityU scientists classify knots efficiently with artificial intelligence" on Apr 16 2020
- 🥨 featured in Official CityU Research News (in Chinese): "相比传统算法,人工智能分辨纽结快了20倍!" on 29 July 2020
Here we demostrate the training code and showcase the generalizability with jupyter notebooks that runs in a GPU-enabled docker container (see section Docker-setup). The best model with weights are provided in this repo (see section Best-Model). The data used in the demo are freely accessible at Zenodo, see Data section for download and extraction instructions.
- Data Used in the Demo
- Docker Setup
- Training Code
- Best Model with Weights
- Generalize to L60 Sub-length
- Generalize to a Different Bending Stiffness
We release the following data to accompany this demo repo:
- L60 circular knots (
L60_Lp4_D9
) five knot types:- Used in Table 1 and Fig. 7 of the publication.
- Used in this repo by
- Section "Training Code"
./Demo_Train_L60_Classifier.ipynb
to reproduce Table 1 - Section "Generalize to L60 Sub-length"
./Generalize_SubLength_L60_Fig7.ipynb
to reproduce Fig. 7
- Section "Training Code"
- Each knot type has 200K conformations, so 1 million conformations in total.
- Zipped as **
1M_L60_Lp4_D9_circular_knot0-31-41-52-51.tar.gz
for download
- L100 circular knots (
L100_Lp2_D11
) of persistence lengthLp = 2a
:- Used in Fig. 11 of the publication.
- Used in this repo by
- Section "Generalize to a Different Bending Stiffness"
./Generalize_Bending_Stiffness_Fig11.ipynb
to reproduce Fig. 11
- Section "Generalize to a Different Bending Stiffness"
- Each knot type has more than 20K conformations, can sample or use all to reproduce Fig. 11 of the paper.
- Zipped as **
L100_Lp2_D11_circular_knot0-31-41-52-51.tar.gz
for download
Both the L60
and L100
datasets are to classify five knot types: knot-0
, knot-31
, knot-41
, knot-52
, and knot-51
.
Each conformation is represented as a txt file of 3D xyz coordinates.
The data are released as open public data on Zenodo at https://zenodo.org/records/10946638
1M_L60_Lp4_D9_circular_knot0-31-41-52-51.tar.gz
DownloadL100_Lp2_D11_circular_knot0-31-41-52-51.tar.gz
Download
To download the data from Zenodo, either use our script ./download_data.sh
, or directly download by clicking the download link. Download and save to ./data/
folder.
Extract the tar.gz
data inside the ./data/
folder:
# extract the five knot-type tar.gz files
cd ./data
tar -xzvf 1M_L60_Lp4_D9_circular_knot0-31-41-52-51.tar.gz
tar -xzvf L100_Lp2_D11_circular_knot0-31-41-52-51.tar.gz
We provide a ./Dockerfile
to build a docker container based on tensorflow:2.4.0-gpu-jupyter
.
The code we used for development from 5 years ago was based on tensorflow-gpu==2.0.0
, but now we found the docker image tensorflow:2.4.0-gpu-jupyter
also works.
The CUDA version for the docker is CUDA 11
.
The advantage of using tensorflow with docker is you do not need to worry (too much) about CUDA versions.
The notebooks from this repo were generated using a laptop GPU (RTX 3080) with CUDA 11.4.
To build the docker, simply run the provided bash script ./build.sh
.
Then launch the docker container with ./run.sh
.
Note that we use docker -v
flag with mounted volume (current dir) for jupyter dir (/tf
).
The files from the current directory are used for the docker in run-time.
# build the container
bash ./build.sh
# run jupyter server inside the container (follow instructions on screen)
bash ./run.sh
run.sh
will launch a jupyter notebook.
Simply follow the text prompts and open the URL in your host web browser: http://127.0.0.1:8888/?token=...
We prepared a jupyter notbook at ./Demo_Train_L60_Classifier.ipynb
showcasing how to train a polymer knot-type classifier based on LSTM with Tensorflow.
The dataset used for this training notebook comprises five knot types of length L60
and each knot type has 200K conformations.
This tutorial notebook reproduces the L60 results from Table 1 from our publication, i.e. training accuracy of ~99%, validation accuracy of ~98%, and evaluation on hold-out testset at ~98% accuracy.
The best RNN model for L100
was a bidirectional LSTM stack with dropout trained on 2 million conformations for each knot type. The model with weights can be loaded directly (~10 MB) from this repo at best_models/temp_20191103-175055_L100_2M_0-31-41-52-51-relative_BiLSTM240BiLSTM200Dp20BiLSTM180BiLSTM180BiLSTM100_.h5
,
using keras.models.load_model()
:
# Best RNN model:
# Nov03 99.59acc BiLSTM stacks
best_model_dir = "best_models/"
save_model_name = best_model_dir + \
"temp_20191103-175055_L100_2M_0-31-41-52-51-relative_BiLSTM240BiLSTM200Dp20BiLSTM180BiLSTM180BiLSTM100_.h5"
print("model to be used: ", save_model_name)
model = keras.models.load_model(save_model_name)
model.summary()
The notebook ./Generalize_SubLength_L60_Fig7.ipynb
showcases the generalizability of L100
classifier to identify knots in sublength polymer L60
.
The model was
trained using polymer length L100
and was asked to predict knot type labels on 1 million L60
polymers.
This tutorial reproduces the Figure 7 (left panel) from the publication
This notebook ./Generalize_Bending_Stiffness_Fig11.ipynb
loads the best weights for a trained model on L100
, and predicts on unseen new conformations with a different bending stiffness.
In the body of the paper, the polymer conformations are
generated with a persistence length Lp = 4a
. To examine
whether our model also works for polymer conformations with a
different bending stiffness, we generate 20 000
conformations
of Lpolymer = 100
and Lp = 2a
for each of the five knot types.
Then, we apply the RNN model trained from conformations with Lp = 4a
to classify these new conformations with
Lp = 2a
. The prediction accuracy is above 99%
for every
knot type, as shown in the Fig. 11 of our PRE publication: