Materials and problem sets for the course Machine Learning for Molecular Engineering (Spring 2022) taught at MIT.
Instructors: Prof. Connor Coley, Prof. Ernest Fraenkel, and Prof. Rafael Gomez-Bombarelli
Teaching Assistants: Kevin Greenman, Vikram Sundar
Course Numbers: 3.C01/3.C51, 10.C01/10.C51, 20.C01/20.C51
Ungraded problem set (no submission) to practice using Google Colab and numpy.
Data size: ~10^2
Basic linear classification problem to get you started for the course. You will use logistic regression to diagnose cancer. You will apply linear methods with L1 and L2 regularization and understand what effects they have on your regression results. You also will experiment with hyperparameter optimization to tune your model with cross-validation.
Data size: ~10^3
You will apply a MLP regressor to predict properties of perovskites. You will compare differences between different representations of the chemical composition of a perovskite crystal. You will also use hyperopt to perform hyperparameter search for your MLP architecture.
Data size: ~10^3
You will apply an MLP regressor to predict MHC binding to peptides. You will compare differences between different representations of the amino acid composition of a peptide. You will also use hyperopt to perform hyperparameter search for your MLP architecture.
Data size: ~10^4
This problem set has two parts: 1) In the first part, you will use PyTorch to train a LSTM-based classifier to classify DNA binding sites. 2) In the second part, you will try to reduce a high-dimensional dataset into lower dimensions with PCA and t-SNE. You will try to find out if the obtained low-dimensional embedding is meaningful.
Data size: ~10^6
This problem set will be more meaty than the previous ones. You will implement your own Graph Neural Nets to predict molecular properties and train a Variational Auto-Encoder to generate new molecules from a learned hidden continuous representation.
Data size: ~10^3
This problem set is an application of computer vision to molecular engineering. You will use a deep learning model to classify steel surface defects and perform image segmentation to identify cell nuclei.
Data size: ~10^3
You will complete a short clustering exercise and participate in a ML competition to predict progression-free survival of cancer patients.
Data size: ~10^3
You will participate in a ML competition to predict solvation free energies of solute/solvent pairs.