Skip to content

CompSoftMatterBiophysics-CityU-HK/Identify-Knot-Types-by-ML-PRE2020

Repository files navigation

🥨 Identify Knot Types by Machine Learning

cityuArticleBanner

To mark the 5-year anniversary of the knot type classification project, we release this public repository to provide the training code, best model with weights, and two showcases of generalizability that all run out-of-the-box in a GPU-enabled docker container.

The work was published in the Physical Review E journal in Febuary 2020 as a research article titled "Identifying knot types of polymer conformations by machine learning":

@article{PhysRevE.101.022502,
  title = {Identifying knot types of polymer conformations by machine learning},
  author = {Vandans, Olafs and Yang, Kaiyuan and Wu, Zhongtao and Dai, Liang},
  journal = {Phys. Rev. E},
  volume = {101},
  issue = {2},
  pages = {022502},
  numpages = {10},
  year = {2020},
  month = {Feb},
  publisher = {American Physical Society},
  doi = {10.1103/PhysRevE.101.022502},
  url = {https://link.aps.org/doi/10.1103/PhysRevE.101.022502}
}

This work, featured as the APS Editors' Suggestion, represented one of the first successful attempts of using deep learning to classify different knot types. It has attracted a number of media coverage since.

Here we demostrate the training code and showcase the generalizability with jupyter notebooks that runs in a GPU-enabled docker container (see section Docker-setup). The best model with weights are provided in this repo (see section Best-Model). The data used in the demo are freely accessible at Zenodo, see Data section for download and extraction instructions.

Table of Contents 🥨

  1. Data Used in the Demo
  2. Docker Setup
  3. Training Code
  4. Best Model with Weights
  5. Generalize to L60 Sub-length
  6. Generalize to a Different Bending Stiffness

0. Data Used in the Demo

We release the following data to accompany this demo repo:

Both the L60 and L100 datasets are to classify five knot types: knot-0, knot-31, knot-41, knot-52, and knot-51. Each conformation is represented as a txt file of 3D xyz coordinates.

The data are released as open public data on Zenodo at https://zenodo.org/records/10946638

  • 1M_L60_Lp4_D9_circular_knot0-31-41-52-51.tar.gz Download
  • L100_Lp2_D11_circular_knot0-31-41-52-51.tar.gz Download

To download the data from Zenodo, either use our script ./download_data.sh, or directly download by clicking the download link. Download and save to ./data/ folder.

Extract the tar.gz data inside the ./data/ folder:

# extract the five knot-type tar.gz files
cd ./data
tar -xzvf 1M_L60_Lp4_D9_circular_knot0-31-41-52-51.tar.gz
tar -xzvf L100_Lp2_D11_circular_knot0-31-41-52-51.tar.gz

1. Docker with Compatible TF2+CUDA+Py

We provide a ./Dockerfile to build a docker container based on tensorflow:2.4.0-gpu-jupyter. The code we used for development from 5 years ago was based on tensorflow-gpu==2.0.0, but now we found the docker image tensorflow:2.4.0-gpu-jupyter also works. The CUDA version for the docker is CUDA 11. The advantage of using tensorflow with docker is you do not need to worry (too much) about CUDA versions. The notebooks from this repo were generated using a laptop GPU (RTX 3080) with CUDA 11.4.

To build the docker, simply run the provided bash script ./build.sh. Then launch the docker container with ./run.sh. Note that we use docker -v flag with mounted volume (current dir) for jupyter dir (/tf). The files from the current directory are used for the docker in run-time.

# build the container
bash ./build.sh

# run jupyter server inside the container (follow instructions on screen)
bash ./run.sh

run.sh will launch a jupyter notebook. Simply follow the text prompts and open the URL in your host web browser: http://127.0.0.1:8888/?token=...

2. Training Code (Demo on L60 200K dataset)

We prepared a jupyter notbook at ./Demo_Train_L60_Classifier.ipynb showcasing how to train a polymer knot-type classifier based on LSTM with Tensorflow.

The dataset used for this training notebook comprises five knot types of length L60 and each knot type has 200K conformations.

This tutorial notebook reproduces the L60 results from Table 1 from our publication, i.e. training accuracy of ~99%, validation accuracy of ~98%, and evaluation on hold-out testset at ~98% accuracy.

Table1_L60_RNN_99acc

3. Best Model with Weights (trained on L100 2M dataset)

The best RNN model for L100 was a bidirectional LSTM stack with dropout trained on 2 million conformations for each knot type. The model with weights can be loaded directly (~10 MB) from this repo at best_models/temp_20191103-175055_L100_2M_0-31-41-52-51-relative_BiLSTM240BiLSTM200Dp20BiLSTM180BiLSTM180BiLSTM100_.h5, using keras.models.load_model():

# Best RNN model:
# Nov03 99.59acc BiLSTM stacks
best_model_dir = "best_models/"
save_model_name = best_model_dir + \
    "temp_20191103-175055_L100_2M_0-31-41-52-51-relative_BiLSTM240BiLSTM200Dp20BiLSTM180BiLSTM180BiLSTM100_.h5"

print("model to be used: ", save_model_name)

model = keras.models.load_model(save_model_name)

model.summary()

4. Generalize to L60 Sub-length

The notebook ./Generalize_SubLength_L60_Fig7.ipynb showcases the generalizability of L100 classifier to identify knots in sublength polymer L60. The model was trained using polymer length L100 and was asked to predict knot type labels on 1 million L60 polymers. This tutorial reproduces the Figure 7 (left panel) from the publication

Fig7_L100trained_predictL60

5. Generalize to a Different Bending Stiffness

This notebook ./Generalize_Bending_Stiffness_Fig11.ipynb loads the best weights for a trained model on L100, and predicts on unseen new conformations with a different bending stiffness.

In the body of the paper, the polymer conformations are generated with a persistence length Lp = 4a. To examine whether our model also works for polymer conformations with a different bending stiffness, we generate 20 000 conformations of Lpolymer = 100 and Lp = 2a for each of the five knot types. Then, we apply the RNN model trained from conformations with Lp = 4a to classify these new conformations with Lp = 2a. The prediction accuracy is above 99% for every knot type, as shown in the Fig. 11 of our PRE publication:

Fig11_L100_Lp2a_persistence_length

These results suggest that the prediction accuracy of our NN is robust to different bending stiffness.

About

Source code and data for "Identifying knot types of polymer conformations by machine learning" PhysRevE.101.022502

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages