1. eda (exploratory data analysis)

Author: Carl McBride Ellis (LinkedIn)

The following represents a selection of my kaggle notebooks

1. eda (exploratory data analysis)

Anscombe's quartet and the importance of EDA (+ dataset)
Absolute beginners Titanic 'EDA' using dabl
Exploratory data analysis using pandas pivot table
Pearson correlation coefficient, mutual information (MI) and Predictive Power Score (PPS) - a simple comparison
Use case example: Jane Street: EDA of day 0 and feature importance
Use case example: Riiid: EDA and feature importance
Use case example: Ventilator Pressure: EDA and simple submission

2. data cleaning / preparation

3. classification / regression

This is a collection of my python example scripts for either classification, using the Titanic: Machine Learning from Disaster competition data, or regression, for which I use the House Prices: Advanced Regression Techniques competition data:

algorithm	classification	regression
Logistic regression	link	---
Generalized Additive Models (GAM)	link	---
Iterative Dichotomiser 3 (ID3)	link	---
Decision tree	link	---
Regularized Greedy Forest (RGF)	link	link
XGBoost	---	link
TabNet	link	link
Neural networks (using keras)	link	link
Gaussian process	link	link
Hyperparameter grid search	link	link

4. conformal prediction

5. feature selection / engineering

6. time series and forecasting

Prediction intervals

Probabilistic forecasting using GluonTS: Bitcoin
[PFI Starter] Skforecast example - starter notebook provided for the "Probabilistic forecasting I: Temperature" competition

7. ensemble methods

8. explainability

9. causality

10. statistics

11. didactic notebooks

Beautiful math in your notebook: a guide to using $\LaTeX$ math markup in kaggle notebooks.
Titanic: In all the confusion... which looks at the confusion matrix, ROC curves, $F_1$ scores etc.
Classification: How imbalanced is "imbalanced"? - (mentioned in "Notebooks of the week: Hidden Gems")
Overfitting and underfitting the Titanic
False positives, false negatives and the discrimination threshold
Introduction to the Regularized Greedy Forest (using rgf_python)
Extrapolation: Do not stray out of the forest!
Titanic: some sex, a bit of class, and a tree...
The Lehmer RNG algorithm for seed=42

12. generative AI

13. miscellaneous

Titanic leaderboard: a score > 0.8 is great!
House Prices: How to work offline (+ dataset)
Pandas one-liners
The latest trends in data science
The Titanic using SQL
Some pretty t-SNE plots
Encuesta kaggle 2021: ¿España es diferente?
How much do people on kaggle earn by country (2021)
All in a pickle: Saving the Titanic - Saving our machine learning model to a file using pickle
Machine learning review papers on arXiv [polars]

Geospatial analysis

Finance related

fun with the meta kaggle dataset

The Meta Kaggle dataset consists of data regarding the kaggle site

Kaggle in numbers - updated almost daily
Simple EDA of kaggle Grandmasters - updated almost daily
Number of active Kaggle users
Notebooks: Number of views, and days, per vote
kaggle discussions: busiest time of the day? - (mentioned in "Notebooks of the week: Hidden Gems")
The kaggle working week
WordCloud of gold medal winning notebook titles
Shakeup interactive scatterplot maker
Shakeup scatterplots: Boxes, strings and things...
When will my notebook get its medal?

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
files		files
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. eda (exploratory data analysis)

2. data cleaning / preparation

3. classification / regression

4. conformal prediction

5. feature selection / engineering

6. time series and forecasting

Prediction intervals

7. ensemble methods

8. explainability

9. causality

10. statistics

11. didactic notebooks

12. generative AI

13. miscellaneous

fun with the meta kaggle dataset

All the best!

About

Languages

Carl-McBride-Ellis/My-kaggle-notebooks

Folders and files

Latest commit

History

Repository files navigation

1. eda (exploratory data analysis)

2. data cleaning / preparation

3. classification / regression

4. conformal prediction

5. feature selection / engineering

6. time series and forecasting

Prediction intervals

7. ensemble methods

8. explainability

9. causality

10. statistics

11. didactic notebooks

12. generative AI

13. miscellaneous

fun with the meta kaggle dataset

All the best!

About

Topics

Resources

Stars

Watchers

Forks

Languages