Skip to content

rebeccabilbro/navyfcu-ml

 
 

Repository files navigation

Generalized Machine Learning

This repository contains notebooks, data, and slides for the survey of generalized machine learning and distributed computing training from September 14, 2018 - September 28, 2018. During this three day course, we will cover the following topics:

Day One:

  • ML Review: Generalized ML and Spatial Learning, Bias/Variance Tradeoff, Model Selection Triple
  • Regularized Regression: LASSO vs Ridge; ElasticNet and more
  • Clustering: Partitive vs Agglomerative Clustering; clustering evaluation methods, visualization
  • Classification I: Instance and Inductive Models (kNN, Decision Trees, Ensembles of Trees)

Day 2:

  • Classification II: Parametric Models: SVMs, Bayesian Models, Logistic Regression
  • Dimensionality Reduction and Manifolds: PCA, SVD, tSNE, Isomaps
  • Neural Networks I: Multi-Layer Perceptrons
  • Neural Networks II: Deep Learning and Tensorflow

Day 3:

  • Introduction to Spark: RDDs and Architecture
  • Programming Spark - interactive analysis and distributed jobs
  • Using Spark for data analysis: Spark SQL and Spark DataFrames
  • Spark for distributed ML: Spark MLlib

Notes:

  • class experience with Logistic Regression and ANNs
  • background is mostly math and stats, not computational
  • don't rely on Python or coding knowledge; do exercises as live demos
  • focus on feature analysis and hyperparameter tuning
  • visual analysis with YB a big help!
  • for distributed computing, focus on high level computing issues, not mechanisms
  • no need for a cluster or workshops on the distributed computing day

Other Notes:

  • Classification Metrics II to follow (ROC/AUC, DecisionThreshold, PR Curves, Class Balance issues)

About

Notebooks and data for Machine Learning course.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 95.8%
  • Jupyter Notebook 4.1%
  • Shell 0.1%