Materials and problem sets for Machine Learning for Molecular Engineering (Spring 2021) taught at MIT
Instructors: prof. Connor Coley, prof. Ernest Fraenkel, and prof. Rafael Gomez-Bombarelli
Teach Assistant: Wujie Wang
Course number: 3.100/3.322, 10.402/10.602, 20.301/20.401
data size:~10^2
Basic Linear classification problem to get you started for the course. You will use Logistic Regression to diagnose Cancer. You will apply linear methods with L1 and L2 regularization and understand what effects they have on your regression results. You also will epxeriment with hyperparameter optimization to tune your model with cross-validation.
data size:~10^3
You will apply a MLP regressor to predict properties of pervskite. You will compare differences between different representations of the chemical composition of a perovskite crystal. You will also use hyperopt to perform hyperparameter search for your MLP architecture
data size:~10^4
This problem set has two parts: 1) In the first part, you will use Pytorch to train a LSTM-based classifier to classify DNA binding site. By building your model, you will understand how a deep learning pipeline is built. 2) In the second part, you will try to reduce high dimensional dataset into lower dimensions with PCA and T-SNE. You are trying to find out if the obtained low dimensional embedding is meaningful.
data size:~10^6
This problem set will be more meaty than previous ones. You will implement your own Graph Neural Nets to predict molecular properties and traing a Variational Auto-Encoder to generate new molecules from a learned hidden continuous representation
data size:~10^3 images
Application of computer vision in molecular engineering. You will use deep-learning model to classify steel surface defects and predict segmentation mask for cellular nuclei.
PS6
You will participate in two ML competitions: