🥨 Identify Knot Types by Machine Learning

To mark the 5-year anniversary of the knot type classification project, we release this public repository to provide the training code, best model with weights, and two showcases of generalizability that all run out-of-the-box in a GPU-enabled docker container.

The work was published in the Physical Review E journal in Febuary 2020 as a research article titled "Identifying knot types of polymer conformations by machine learning":

@article{PhysRevE.101.022502,
  title = {Identifying knot types of polymer conformations by machine learning},
  author = {Vandans, Olafs and Yang, Kaiyuan and Wu, Zhongtao and Dai, Liang},
  journal = {Phys. Rev. E},
  volume = {101},
  issue = {2},
  pages = {022502},
  numpages = {10},
  year = {2020},
  month = {Feb},
  publisher = {American Physical Society},
  doi = {10.1103/PhysRevE.101.022502},
  url = {https://link.aps.org/doi/10.1103/PhysRevE.101.022502}
}

This work, featured as the APS Editors' Suggestion, represented one of the first successful attempts of using deep learning to classify different knot types. It has attracted a number of media coverage since.

🥨 featured in Nature's Research Highlight: "A neural network unpicks the knots" on 21 February 2020
🥨 featured in APS Physics Interview: "Neural Networks Know Their Knots" on 21 February 2020
🥨 featured in City University of Hong Kong Research Stories: "CityU scientists classify knots efficiently with artificial intelligence" on Apr 16 2020
🥨 featured in Official CityU Research News (in Chinese): "相比传统算法，人工智能分辨纽结快了20倍！" on 29 July 2020

Here we demostrate the training code and showcase the generalizability with jupyter notebooks that runs in a GPU-enabled docker container (see section Docker-setup). The best model with weights are provided in this repo (see section Best-Model). The data used in the demo are freely accessible at Zenodo, see Data section for download and extraction instructions.

Table of Contents 🥨

Data Used in the Demo
Docker Setup
Training Code
Best Model with Weights
Generalize to L60 Sub-length
Generalize to a Different Bending Stiffness

0. Data Used in the Demo

We release the following data to accompany this demo repo:

L60 circular knots (L60_Lp4_D9) five knot types:
- Used in Table 1 and Fig. 7 of the publication.
- Used in this repo by
  - Section "Training Code" ./Demo_Train_L60_Classifier.ipynb to reproduce Table 1
  - Section "Generalize to L60 Sub-length" ./Generalize_SubLength_L60_Fig7.ipynb to reproduce Fig. 7
- Each knot type has 200K conformations, so 1 million conformations in total.
- Zipped as **1M_L60_Lp4_D9_circular_knot0-31-41-52-51.tar.gz for download
L100 circular knots (L100_Lp2_D11) of persistence length Lp = 2a:
- Used in Fig. 11 of the publication.
- Used in this repo by
  - Section "Generalize to a Different Bending Stiffness" ./Generalize_Bending_Stiffness_Fig11.ipynb to reproduce Fig. 11
- Each knot type has more than 20K conformations, can sample or use all to reproduce Fig. 11 of the paper.
- Zipped as **L100_Lp2_D11_circular_knot0-31-41-52-51.tar.gz for download

Both the L60 and L100 datasets are to classify five knot types: knot-0, knot-31, knot-41, knot-52, and knot-51. Each conformation is represented as a txt file of 3D xyz coordinates.

The data are released as open public data on Zenodo at https://zenodo.org/records/10946638

1M_L60_Lp4_D9_circular_knot0-31-41-52-51.tar.gz Download
L100_Lp2_D11_circular_knot0-31-41-52-51.tar.gz Download

To download the data from Zenodo, either use our script ./download_data.sh, or directly download by clicking the download link. Download and save to ./data/ folder.

Extract the tar.gz data inside the ./data/ folder:

# extract the five knot-type tar.gz files
cd ./data
tar -xzvf 1M_L60_Lp4_D9_circular_knot0-31-41-52-51.tar.gz
tar -xzvf L100_Lp2_D11_circular_knot0-31-41-52-51.tar.gz

1. Docker with Compatible TF2+CUDA+Py

We provide a ./Dockerfile to build a docker container based on tensorflow:2.4.0-gpu-jupyter. The code we used for development from 5 years ago was based on tensorflow-gpu==2.0.0, but now we found the docker image tensorflow:2.4.0-gpu-jupyter also works. The CUDA version for the docker is CUDA 11. The advantage of using tensorflow with docker is you do not need to worry (too much) about CUDA versions. The notebooks from this repo were generated using a laptop GPU (RTX 3080) with CUDA 11.4.

To build the docker, simply run the provided bash script ./build.sh. Then launch the docker container with ./run.sh. Note that we use docker -v flag with mounted volume (current dir) for jupyter dir (/tf). The files from the current directory are used for the docker in run-time.

# build the container
bash ./build.sh

# run jupyter server inside the container (follow instructions on screen)
bash ./run.sh

run.sh will launch a jupyter notebook. Simply follow the text prompts and open the URL in your host web browser: http://127.0.0.1:8888/?token=...

2. Training Code (Demo on L60 200K dataset)

We prepared a jupyter notbook at ./Demo_Train_L60_Classifier.ipynb showcasing how to train a polymer knot-type classifier based on LSTM with Tensorflow.

The dataset used for this training notebook comprises five knot types of length L60 and each knot type has 200K conformations.

This tutorial notebook reproduces the L60 results from Table 1 from our publication, i.e. training accuracy of ~99%, validation accuracy of ~98%, and evaluation on hold-out testset at ~98% accuracy.

3. Best Model with Weights (trained on L100 2M dataset)

The best RNN model for L100 was a bidirectional LSTM stack with dropout trained on 2 million conformations for each knot type. The model with weights can be loaded directly (~10 MB) from this repo at best_models/temp_20191103-175055_L100_2M_0-31-41-52-51-relative_BiLSTM240BiLSTM200Dp20BiLSTM180BiLSTM180BiLSTM100_.h5, using keras.models.load_model():

# Best RNN model:
# Nov03 99.59acc BiLSTM stacks
best_model_dir = "best_models/"
save_model_name = best_model_dir + \
    "temp_20191103-175055_L100_2M_0-31-41-52-51-relative_BiLSTM240BiLSTM200Dp20BiLSTM180BiLSTM180BiLSTM100_.h5"

print("model to be used: ", save_model_name)

model = keras.models.load_model(save_model_name)

model.summary()

4. Generalize to L60 Sub-length

The notebook ./Generalize_SubLength_L60_Fig7.ipynb showcases the generalizability of L100 classifier to identify knots in sublength polymer L60. The model was trained using polymer length L100 and was asked to predict knot type labels on 1 million L60 polymers. This tutorial reproduces the Figure 7 (left panel) from the publication

5. Generalize to a Different Bending Stiffness

This notebook ./Generalize_Bending_Stiffness_Fig11.ipynb loads the best weights for a trained model on L100, and predicts on unseen new conformations with a different bending stiffness.

In the body of the paper, the polymer conformations are generated with a persistence length Lp = 4a. To examine whether our model also works for polymer conformations with a different bending stiffness, we generate 20 000 conformations of Lpolymer = 100 and Lp = 2a for each of the five knot types. Then, we apply the RNN model trained from conformations with Lp = 4a to classify these new conformations with Lp = 2a. The prediction accuracy is above 99% for every knot type, as shown in the Fig. 11 of our PRE publication:

These results suggest that the prediction accuracy of our NN is robust to different bending stiffness.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🥨 Identify Knot Types by Machine Learning

Table of Contents 🥨

0. Data Used in the Demo

1. Docker with Compatible TF2+CUDA+Py

2. Training Code (Demo on L60 200K dataset)

3. Best Model with Weights (trained on L100 2M dataset)

4. Generalize to L60 Sub-length

5. Generalize to a Different Bending Stiffness

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
best_models		best_models
data		data
paper		paper
.gitignore		.gitignore
Demo_Train_L60_Classifier.ipynb		Demo_Train_L60_Classifier.ipynb
Dockerfile		Dockerfile
Generalize_Bending_Stiffness_Fig11.ipynb		Generalize_Bending_Stiffness_Fig11.ipynb
Generalize_SubLength_L60_Fig7.ipynb		Generalize_SubLength_L60_Fig7.ipynb
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
download_data.sh		download_data.sh
run.sh		run.sh

License

CompSoftMatterBiophysics-CityU-HK/Identify-Knot-Types-by-ML-PRE2020

Folders and files

Latest commit

History

Repository files navigation

🥨 Identify Knot Types by Machine Learning

Table of Contents 🥨

0. Data Used in the Demo

1. Docker with Compatible TF2+CUDA+Py

2. Training Code (Demo on L60 200K dataset)

3. Best Model with Weights (trained on L100 2M dataset)

4. Generalize to L60 Sub-length

5. Generalize to a Different Bending Stiffness

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages