Machine Learning for Molecular Engineering (3.C01/3.C51, 10.C01/10.C51, 20.C01/20.C51)

Materials and problem sets for the course Machine Learning for Molecular Engineering (Spring 2022) taught at MIT.

Instructors: Prof. Connor Coley, Prof. Ernest Fraenkel, and Prof. Rafael Gomez-Bombarelli

Teaching Assistants: Kevin Greenman, Vikram Sundar

Course Numbers: 3.C01/3.C51, 10.C01/10.C51, 20.C01/20.C51

Problem Sets

PS0

Ungraded problem set (no submission) to practice using Google Colab and numpy.

PS1

Data size: ~10^2

Basic linear classification problem to get you started for the course. You will use logistic regression to diagnose cancer. You will apply linear methods with L1 and L2 regularization and understand what effects they have on your regression results. You also will experiment with hyperparameter optimization to tune your model with cross-validation.

PS2 (Perovskites)

Data size: ~10^3

You will apply a MLP regressor to predict properties of perovskites. You will compare differences between different representations of the chemical composition of a perovskite crystal. You will also use hyperopt to perform hyperparameter search for your MLP architecture.

PS2 (MHC)

Data size: ~10^3

You will apply an MLP regressor to predict MHC binding to peptides. You will compare differences between different representations of the amino acid composition of a peptide. You will also use hyperopt to perform hyperparameter search for your MLP architecture.

PS3

Data size: ~10^4

This problem set has two parts: 1) In the first part, you will use PyTorch to train a LSTM-based classifier to classify DNA binding sites. 2) In the second part, you will try to reduce a high-dimensional dataset into lower dimensions with PCA and t-SNE. You will try to find out if the obtained low-dimensional embedding is meaningful.

PS4

Data size: ~10^6

This problem set will be more meaty than the previous ones. You will implement your own Graph Neural Nets to predict molecular properties and train a Variational Auto-Encoder to generate new molecules from a learned hidden continuous representation.

PS5

Data size: ~10^3

This problem set is an application of computer vision to molecular engineering. You will use a deep learning model to classify steel surface defects and perform image segmentation to identify cell nuclei.

PS6 (Cancer)

Data size: ~10^3

You will complete a short clustering exercise and participate in a ML competition to predict progression-free survival of cancer patients.

PS6 (Molecule)

Data size: ~10^3

You will participate in a ML competition to predict solvation free energies of solute/solvent pairs.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
psets		psets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning for Molecular Engineering (3.C01/3.C51, 10.C01/10.C51, 20.C01/20.C51)

Problem Sets

About

Releases

Packages

Contributors 2

Languages

vikram-sundar/ML4MolEng_Spring2022

Folders and files

Latest commit

History

Repository files navigation

Machine Learning for Molecular Engineering (3.C01/3.C51, 10.C01/10.C51, 20.C01/20.C51)

Problem Sets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages