Skip to content

kyrxanthos/msc-statistics-cw

Repository files navigation

MSc Statistics Courseworks

This repo contains some of the courseworks I completed in my Statistics (Data Science) Masters at Imperial College London during 2021-2022. Some modules were excluded either because they did not have a coding part or because the course content distribution was not permitted (Deep Learning with Tensorflow).

The course description for each module are included below:

Applied Statistics

The course covered the following topics:

  • The Normal Linear model (estimation, residuals, residual sum of squares, goodness of fit, hypothesis testing, ANOVA, model comparison).
  • Improving designs and explanatory Variables (categorical variables and multi-level regression, random and mixed effects models).
  • Diagnostics and Model Selection (outliers, leverage, misfit, exploratory and criterion-based model selection, Box-Cox transformations, weighted regression)
  • Generalised Linear Models (exponential family of distributions, iteratively re-weighted least squares, model selection and diagnostics).

Big Data

The objective of this module was to become comfotable with the use of common Big Data tools, with an emphasis on the use of advanced statistical methods for analysis. The module focused on the application of statistical methods in the processing platforms Hadoop and Spark.

Computational Statistics

The course covers a number of computational methods that are key in modern statistics. Topics include:

  • Statistical computing: R programming, data structures, programming constructs, object system, graphics.
  • Numerical methods: root finding, numerical integration, optimisation methods such as EM-type algorithms.
  • Simulation: generating random variates, Monte Carlo integration.
  • Simulation approaches in inference: randomisation and permutation procedures, bootstrap, Markov Chain Monte-Carlo.

Data Science

This module covered computing with data, producing reproducible work flows, preparing messy real-world datasets, performing exploratory data analysis and presenting data via data visualisation techniques. In addition, it covered the science in data science, exploring what data analysts really do, thinking critically about appropriate uses and misuses of data science.

Machine Learning

The course focused on a variety of useful techniques including methods for regression, classification, feature extraction, dimensionality reduction, and data clustering. State-of-art approaches such as Random Forest, Neural networks, kernel methods and Gaussian processes were introduced.

Statistical Genetics and Bionformatics

In this module we developed models and tools to understand complex and high-dimensional genetics datasets. This included statistical and machine learning techniques for: multiple testing, penalised regression, clustering, p-value combination, dimension reduction. The module covered both Frequentist and Bayesian statistical approaches. In addition to the statistical approaches, we were introduced to genome-wide association and expression studies data, next generation sequencing and other OMICS datasets.

Some results

Big Data

Computational Statistics

Data Science

Machine Learning

Statistical Genetics and Bioinformatics

About

MSc Statistics Courseworks of selected modules.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published