-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathkaggle.Rmd
51 lines (28 loc) · 2.57 KB
/
kaggle.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
title: "In-class Kaggle Competition"
---
### What is Kaggle?
[Kaggle](www.kaggle.com) is a huge data science community where machine learning practitioners around the world compete against each other in solving prediction problems. The data sets used in Kaggle competitions are uploaded by public and private companies (e.g., Google, Facebook) as well as by government agencies (e.g., the U. S. Department of Homeland Security).
A "Kaggler" wins a competition if her algorithm is the most accurate on a particular data set. Winners receive financial reward, job offers, and recognition from the community.
Kaggle competitions are one of the best places to practice your ML skills and learn about state-of-the-art ML method. for instance, you can learn a lot about the tricks of the trade by reading interviews with past winners.
### __ml4econ__ Kaggle Competition
One of your tasks in this course will be a Kaggle competition. In this competition, you will rely on the "Boston Housing Data" to train and test machine learning models learned in the course. In particular, you will be required to apply the tools introduced in the course in order to predict the median house value based on a set of area specific housing market features. For more details, please visit the competition's [website](https://www.kaggle.com/t/97eb0edcbe7c406882c7c067076bedd3).
### Getting Started
1. Visit [www.kaggle.com](www.kaggle.com) and open an account.
2. Go to the ml4econ competition [webpage](https://www.kaggle.com/t/97eb0edcbe7c406882c7c067076bedd3).
3. Review competition details - Objectives, deadline, data, evaluation, submission rules, etc.
### Basic Kaggle Competition Workflow
1. Acquire domain knowledge.
2. Explore the data.
3. Preprocessing (standardization, dummies, interactions, etc.).
4. Choose a model class (Lasso, trees, NN, ensembles, etc.).
5. Tune complexity (Cross validation).
6. Submit your prediction.
Important note: Examining and preprocessing the data with the help of domain knowledge (a.k.a. "feature engineering"") is probably one of the most important steps in applied ML.
### How Does Kaggle Ranking Work?
* MSE for the public test immediately available upon submission.
* MSE for the private test available only once the competition closes.
* The split between public and private test sets is arbitrary and unknown in advance.
The competition's final ranking is based on how well individuals perform on the **PRIVATE** test set.
### Resources
* Dataquest tutorial: [Kaggle Fundamentals: The Titanic Competition](https://www.dataquest.io/blog/kaggle-fundamentals/)