Skip to content

buswedg/Coursera

Repository files navigation

Project
Data Science Specialization
Author Expertise Tool Industry
Darryl Buswell Data Applications
Exploratory Analysis
Machine Learning
Statistical Inference
R/R-Studio
Shiny
Energy
Environment
Health Care
Healthcare
Information Technology
Transportation
Description

Concepts and tools needed throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results.

Includes:

  • Practical application of statistical computing through reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code.
  • Basic data cleaning of an 'activity recognition dataset' of 30 subjects who wore waist-mounted smartphone sensors. Includes R code to load the raw dataset and processing instructions formalized in a markdown based codebook.
  • Exploratory analysis techniques in R for summarizing data, including how to implement multivariate statistical techniques and use plotting systems in order to summarize high-dimensional data.
  • Use of R tools to generate data analysis in a markdown document with a focus on providing results which can be easily reproduced. R markdown code integrates live R code, knitr and related tools.
  • Collection of R scripts which employ fundamentals of statistical inference, including broad theories such as frequentists, Bayesian, and likelihood.
  • Regression analysis performed on a collection of cars in order to explore the relationship between car features and fuel consumption. Includes special cases of the regression model, ANOVA and ANCOVA with analysis of dummy variable, multivariable adjustment, residuals and variability.
  • Application of machine learning algorithms (decision tree, random forest and generalized boosted regression) using R, in order to explore personal activity data and predict the manner in which individuals completed particular exercises.
  • A simple, yet scalable, web application built using Shiny, R packages, and interactive graphics, with a focus on automating statistical inference of a dataset related to passengers onboard the Titanic.
Dataset
  • Air pollution monitoring data at 332 locations in the US. [link]
  • Patient quality of care statistics for over 4,000 US hospitals from the Medicare.gov Hospital Compare service. [link]
  • Activity recognition data set built from the recordings of 30 subjects performing basic activities and postural transitions while carrying a waist-mounted smartphone with embedded inertial sensors, from the UCI Machine Learning Repository. [link]
  • Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years, from the UCI Machine Learning Repository. [link]
  • Fine particulate matter (PM2.5) air pollutant data for the US for the period of 1999-2008, from the EPA National Emissions Inventory. [link]
  • Data from a monitoring device (number of steps taken) worn by an anonymous individual worn between Oct-Nov 2014. [link]
  • Storm Data' publication data from the National Oceanic and Atmospheric Administration (NOAA) for the period of 1950-2011. [link]
  • The response in the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs who received one of three dose levels of vitamin C by one of two delivery methods. [link]
  • Data extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). [link]
  • Weight lifting exercise data from accelerometers on the belt, forearm, arm, and dumbbell of six participants. [link]
  • Passenger data (age, gender, fare, cabin etc.) who were onboard the Titanic. [link]

Project
Fundamentals of Computing Specialization
Author Expertise Tool Industry
Darryl Buswell Data Applications
Statistical Inference
Python Entertainment
Information Technology
Description

Introduction to Python, with a focus on mathematical and programming techniques, and mathematical tools for reasoning about the correctness and efficiency of algorithms.

Includes:

  • A number of basic interactive applications (games) built using Python, including 'Rock-Paper-Scissors-Lizard-Spock', 'Guess the Number' and 'Stopwatch: The Game', and 'Pong'.
  • A number of basic/intermediate interactive applications (games) built using Python, including 'Memory', 'Blackjack', and 'Spacerocks'.
  • A number of intermediate interactive applications (games) built using Python, including 'Solitaire Mancala', '2048', and 'Tic-Tac-Toe'.
  • Algorithmic thinking to solve real-world problems, including; 1) understanding the problem; 2) formulating the problem mathematically; 3) designing an algorithm; 4) implementing the algorithm; and 5) solving the original scientific problem.
Dataset

Project
Machine Learning
Author Expertise Tool Industry
Darryl Buswell Machine Learning Matlab/Octave Education
Environment
Food, Beverages and Tobacco
Housing
Information Technology
Manufacturing
Description

Machine learning, datamining, and statistical pattern recognition utilizing GNU Octave. Including, 1) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks); 2) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning); and 3) Best practices in machine learning (bias/variance theory and innovation process in machine learning and AI).

Includes:

  • Implementation of linear regression analysis with one variable (city population) to predict profits for a food truck which is to operate in different cities.
  • Linear regression analysis with multiple variables (including living area size and number of bedrooms) to predict the house prices in Portland, Oregon.
  • Implementation of logistic regression analysis to predict the chance a student would be admitted into a University based on their results from two standardized tests.
  • Predict whether microchips from a fabrication plant would pass quality assurance standards based on results from two tests.
  • Logistic regression analysis and feedforward propagation neural network, used to recognize images of handwritten digits (from 0 to 9).
  • Backward propagation algorithm to learn parameters for a neural network, used to recognize images of handwritten digits (from 0 to 9).
  • Regularized linear regression to predict the amount of water flowing out of a dam using the change of water level in reservoir.
  • Implementation of support vector machine classifier to build an email spam filter.
  • Implementation of K-means clustering algorithm to compress the size of an image by reducing its number of colors.
  • Principle component analysis (PCA) to perform dimensionality reduction on a dataset of 5,000 face images.
  • Anomaly detection algorithm, applied in order to detect failing servers on a network.
  • Utilization of collaborative filtering in order to build a recommender system for movies.
Dataset
  • Food truck profit and population data for the various cities those food trucks operate.
  • Housing data for Portland, Oregon, including house price, living area and number of bedrooms.
  • Dataset representing 80 students who were/were not admitted into college based on results of two standardized tests.
  • Quality assurance data for microchips from a fabrication plant.
  • Examples of handwritten digits from the MNIST database. [link]
  • Dataset of a dam water level through time.
  • Collection of spam and non-spam emails from a subset of the SpamAssassin Public Corpus. [link]
  • Image of a small bird.
  • 5,000 face images.
  • Server performance data, including throughput (mb/s) and latency (ms) for 307 servers.
  • Dataset of movie ratings for 1,682 movies, ranked by 943 users on a scale from 1 to 5. [link]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published