Skip to content

bhupeshdutt/100-Days-of-Code-Data-Science

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

100 Days of Data Science Code

Starting a 100 Days Code Challenge for Learning Data Science from Scratch is my goal on Learning Data Science in Machine Learning by:

  • Learning Fundamentals of Python
  • Python Libraries for Data Science
  • Data Manipulation and Preprocessing
  • Machine Learning Basics
  • Advanced Machine Learning Techniques
  • Deep Learning and Neural Networks
  • Model Evaluation and Deployment
  • Data Science Project and Wrap-Up


Articles Published on LinkedIn


Calendar Progress

July 2023

Sun Mon Tues Wed Thurs Fri Sat
- - - - - - 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 âś… 19 âś… 20 âś… 21 âś… 22 âś…
23 âś… 24 âś… 25 âś… 26 âś… 27 âś… 28 âś… 29 âś…
30 âś… 31 âś… - - - - -

August 2023

Sun Mon Tues Wed Thurs Fri Sat
- - 1 âś… 2 âś… 3 âś… 4 âś… 5 âś…
6 âś… 7 âś… 8 âś… 9 âś… 10 âś… 11 âś… 12 âś…
13 âś… 14 âś… 15 âś… 16 âś… 17 âś… 18 âś… 19 âś…
20 âś… 21 âś… 22 âś… 23 âś… 24 âś… 25 âś… 26 âś…
27 âś… 28 âś… 29 âś… 30 âś… 31 âś… - -

September 2023

Sun Mon Tues Wed Thurs Fri Sat
- - - - - 1 âś… 2 âś…
3 âś… 4 âś… 5 âś… 6 âś… 7 âś… 8 âś… 9 âś…
10 âś… 11 âś… 12 âś… 13 âś… 14 âś… 15 âś… 16 âś…
17 âś… 18 âś… 19 âś… 20 âś… 21 âś… 22 âś… 23 âś…
24 âś… 25 âś… 26 âś… 27 âś… 28 âś… 29 âś… 30 âś…

October 2023

Sun Mon Tues Wed Thurs Fri Sat
1 âś… 2 âś… 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 - - - -

100 Days of Data Science Code Day-to-Day Progress

DAY 1 (18 July 2023):

Goal: Python Basics

  • Control flow statements like if-else conditions and loops.

Github Repository: Source Code

LinkedIn post: Daily Update


DAY 2 (19 July 2023):

Goal: Functions and Modules

  • Concept of modules.
  • How to import and use built-in modules as well as create your own.

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 3 (20 July 2023):

Goal: Data Structures

  • Python's built-in data structures such as lists, tuples, dictionaries, and sets.
  • Also, learn about indexing, slicing, and manipulating these data structures.

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 4 (21 July 2023):

Goal: File Handling and Exception Handling

  • Read from and write to files in Python.
  • Learn about exception handling and how to handle errors using try-except blocks.

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 5 (22 July 2023):

Goal: Python Classes and Objects

  • Class Declaration
  • Object Instantiation
  • Constructor and Destructor
  • Built-in Class Attributes and Functions
  • Instance, Class and Static Variables and Functions.

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 6 (23 July 2023):

Goal: Python OOPs Concepts and Implementation in Python

  • Data Abstraction
  • Encapsulation
  • Inheritance
  • Polymorphism.

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 7 (24 July 2023):

Goal: Advanced Python Concepts

  • Higher Order Functions
  • List Comprehensions
  • Regular Expressions (RegEx)

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 8 (25 July 2023):

Goal: Python Connectivity with MySQL Database

  • Setting Up MySQL Connection
  • Executing SQL Queries.

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 9 (26 July 2023):

Goal: Day 1 of Bank Management System

  • Database Setup
  • Python Environment Setup
  • Database Connectivity
  • Create Basic Classes
  • Customer Management.

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 10 (27 July 2023):

Goal: Day 2 of Bank Management System

  • Account Management(Create Account, List Account Details)
  • Basic Error Handling(Apply Validations on Input values)
  • Testing and Debugging(Checking Input value validations).

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 11 (28 July 2023):

Goal: Final Day of Project (Transfer Operations and Final Testing)

  • Transfer Operation
  • Final Testing and Documentation
  • Clean Up and Deployment.

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 12 (29 July 2023):

Goal: NumPy Basics and Array Manipulation

  • Introduction to NumPy
  • Installing NumPy
  • Creating NumPy arrays
  • Array indexing and slicing
  • Array reshaping and resizing
  • Stacking and splitting arrays.

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 13 (30 July 2023):

Goal: Mathematical Operations with NumPy

  • Element-wise Operations
  • Aggregation Functions
  • Linear Algebra with NumPy.

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 14 (31 July 2023):

Goal: Statistics Functions with NumPy

  • Descriptive statistics
  • Random number generation
  • Sorting and searching arrays

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 15 (1 Aug. 2023):

Goal: Introduction to Pandas and Data Structures in Pandas

  • Introduction to Pandas
  • Install Pandas
  • Types of Data Structures : Series, DataFrames
  • Importing and Exporting DataFrames
  • DataFrame Functions
  • Accessing DataFrames : Indexing, Slicing, loc[], iloc[].

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 16 (2 Aug. 2023):

Goal: Data Manipulation and Data Aggregation using Pandas

  • Advanced Indexing and Selection - (Label-based indexing, boolean indexing, and advanced slicing)
  • Combining DataFrames - (Concatenation, merging, and joining techniques)
  • Data Manipulation
  • Advanced Data Manipulation - (reshaping data, pivoting, and melting)
  • Data Aggregation and Grouping - (groupby() and other aggregation Functions)

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 17 (3 Aug. 2023):

Goal: Data Cleaning

  • Basic Data Cleaning and Pre-Processing:
    • Removing Duplicates
    • Fixing Wrong Data
    • Cleaning Data of Wrong Format
    • Cleaning Empty Cells
    • dropna(), fillna()
    • drop_duplicates()
  • Data Transformation - ( apply() and map() )
  • Working with Text Data - Functions of str attribute

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 18 (4 Aug. 2023):

Goal: Feature Engineering and Time Series Analysis

  • Feature Engineering:
    • Data Normalization
    • Data Scaling
    • Data Standardization
  • Time Series Analysis and Resampling:
    • Working with datetime data
    • Date offsets
    • Resampling time series data
    • Datetime index

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 19 (5 Aug. 2023):

Goal: Matplotlib Introduction and Line Plots

  • Matplotlib:
    • Installation of Matplotlib library
    • Import Matplotlib library
  • Matplotlib Pyplot:
    • Plotting x and y points
    • Plotting without line
    • Matplotlib Markers (Types, Color, Size)
    • Matplotlib Line (LineStyle, Line colors, line width)
    • Single Plot with multiple lines
    • Matplotlib Labels and Title (Create Label, Create Title, Set font properties to Title and Label, Title Position)
    • Adding Grid Lines (Line Properties of grid)
  • Matplotlib Bars:
    • Vertical Bars
    • Horizontal Bars
    • Bar colors
    • Bar width
    • Bar height

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 20 (6 Aug. 2023):

Goal: Matplotlib Scatter Plot and Histogram

  • Subplots:
    • subplot() function
    • Title for each subplot
    • Super title of Plot
  • Matplotlib Scatter Plot:
    • Create Scatter Plots
    • Compare Plots
    • Color each dots
    • ColorMap for dots
    • Combine Color, Size and Alpha values
  • Matplotlib Histograms:
    • Create Histogram
  • Matplotlib Pie Charts:
    • Create Pie Chart
    • Labels
    • startAngle
    • Explode
    • Shadow
    • Colors
    • Legend
    • Header

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 21 (7 Aug. 2023):

Goal: Seaborn Introduction

  • Seaborn:
    • Installation of Seaborn
    • Import Seaborn library
  • Different types of plots:
    • Relational Plots
    • Categorical Plots
    • Distribution Plots
    • Regression Plots
  • Categorical Plots:
    • Bar Plot
    • Count Plot
    • Box Plot
    • Violinplot
    • Stripplot
    • Swarmplot
    • Factorplot
  • Distribution Plots:
    • Histogram
    • Distplot
    • Jointplot
    • Pairplot
    • Rugplot
    • KDE Plot

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 22 (8 Aug. 2023):

Goal: Seaborn Visualization Plots - Relational and Regression Plots

  • Customizing Seaborn Plots:

    • Changing Figure Asthetics
    • Removal of Spines
    • Changing the Figure size
    • Scaling the plots
    • Setting the Style Temporarily
    • Color Palette - (Diverging, Sequential, Default color palette)
  • Multiple Plots with Seaborn:

    • Using Matplotlib - (add_axes(), subplot(), subplot2grid() functions)
    • Using Seaborn - (FacetGrid() method, PairGrid() method)
  • Relational Plot Types:

    • relplot()
    • Scatter Plot
    • Line Plot
  • Regression Plot Types:

    • lmplot
    • RegPlot
  • Matrix Plots:

    • HeatMap
    • Clustermap

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 23 (9 Aug. 2023):

Goal: Python Fundamentals Notes

  • Introduction
    • Features
    • Applications
  • Identifiers:
    • Keywords
    • Variables and Constants
  • Operators in python
  • Data types in python
    • String data type and operations
    • List data type and operations
    • Tuple data type and operations
    • Set data type and operations
    • Dictionary data type and operations
  • Control Statements in python:
    • Decision making
    • looping statements
    • looping control statements

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 24 (10 Aug. 2023):

Goal: Python Fundamentals Notes

  • Introduction
    • Installation
    • Import
  • Create arrays in python
  • Array creation using NumPy Functions
    • zeros
    • ones
    • arange
    • linspace
    • eye
    • identity
    • fromiter
  • Accessing array elements
    • Indexing and Slicing
  • Random number Generation
    • rand()
    • random()
    • ranf()
    • randint()
    • randn()

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 25 (11 Aug 2023):

Goal: Pandas Revision

  • Introduction - Install, Import
  • Data Structures:
    • Series
    • DataFrames
  • DataFrames
    • Importing and Exporting
    • Functions - columns, describe(), info(), head(), tail(), isna()
    • Accessing DataFrames - loc[], iloc[],
  • Basic Data Cleaning:
    • Empty Cells
    • Wrong Format Data
    • Fixing Wrong Data
    • Removing Duplicates
  • Apply filters
    • apply()
    • map() - Using Dictionary, Series, Function for mapping

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 26 (12 Aug 2023):

Goal: Introduction to Artificial Intelligence and Machine Learning Fundamentals

  • Artificial Intelligence:
  • Machine Learning:
    • Difference between Artificial Intelligence and Machine Learning
    • Applications of Machine Learning
    • Limitations of Machine Learning
    • Types of Machine Learning
      • Supervised Learning
      • Unsepervised Learning
      • Reinforcement Learning
    • Comparisons between all types

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 27 (13 Aug 2023):

Goal: Understanding Machine Learning Workflow

  • 1. Data Preprocessing:
    • Data Cleaning
    • Feature Selection/Extraction
    • Normalization/Scaling
    • Encoding Categorical Variables
    • Splitting Data
  • 2. Model Training:
    • Selecting a Model
    • Initializing Parameters
    • Training Loop
    • Gradient Descent (for Optimization)
    • Hyperparameter Tuning
  • 3. Model Evaluation:
    • Metrics
    • Cross-Validation
    • Confusion Matrix
    • ROC and AUC
    • Overfitting and Underfitting

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 28 (14 Aug 2023):

Goal: Model Evaluation Techniques in Machine Learning

  • Cross-Validation
  • Evaluation Metrics:
    • Accuracy
    • Precision
    • Recall
    • F1-Score
    • Area Under Curve (AUC) and Receiver Operating Characteristic (ROC)
  • Confusion Matrix
  • Overfitting and Underfitting Detection:
    • Overfitting
    • Underfitting

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 29 (15 Aug 2023):

Goal: Diagnosing and Addressing Underfitting and Overfitting

  • Underfitting:
    • Choosing a more complex model
    • Adding more features
    • Fine-tuning hyperparameters
  • Overfitting:
    • Collect more data
    • Feature selection
    • Cross-validation
    • Regularization techniques
    • Early stopping

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 30 (16 Aug 2023):

Goal: Simple Linear Regression Implementation

  • Linear Regression Introduction
  • Simple Linear Regression:
    • Assumptions of Simple LR
    • Equation of Simple LR
    • Applications of Linear Regression
    • Working of Linear Regression
    • Finding goodness of fit
    • Examples of Linear Regression
    • Implementation of Simple Linear Regression
    • Real-world Application: Salary Prediction

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 31 (17 Aug 2023):

Goal: Multiple Linear Regression and Implementation using Student Performance Analysis

  • Multiple Linear Regression (MLR):
    • Key points of MLR
    • Equation of MLR
    • Assumptions of MLR
    • Implementation of MLR using Python
    • Real-world Application: Student Performance Analysis

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 32 (18 Aug 2023):

Goal: Classification in Machine Learning

  • Classification

  • Types of Learners:

    • Lazy Learners: Firstly, store dataset and wait until receive test dataset.
    • Eager Learner: Develop classification model based on training dataset, before receiving testing dataset.
  • Types of Classification Algorithms:

    • Logistic Regression
    • Decision Trees
    • Random Forest
    • Support Vector Machines (SVM)
    • K-Nearest Neighbors (KNN)
    • Naive Bayes
    • Neural Networks
  • Terminologies in Classification:

    • Features and Labels
    • Training and Testing Data
    • Confusion Matrix
    • Precision, Recall, F1-Score
    • ROC and AUC Curve
  • Types of Classification:

    • Binary Classification: Two classes (e.g., Yes/No)
    • Multiclass Classification: Multiple distinct classes (e.g., Cat/Dog/Horse)
  • Models' Evaluation Techniques for Classification: Used for finding goodness of model's fit:

    • Accuracy
    • Precision and Recall
    • F1-Score
    • ROC Curve and AUC
    • Confusion Matrix

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 33 (19 Aug 2023):

Goal: Logistic Regression Implementation

  • Logistic Regression:
    • Logistic Function (Sigmoid Function)
    • Assumptions of Logistic Regression
    • Types of Logistic Regression:
      • Binary / Binomial
      • Multinomial
      • Ordinal
    • Terminologies involved in Logistic Regression
    • Implementation of Logistic Regression
  • Difference between Linear Regression and Logistic Regression

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 34 (20 Aug 2023):

Goal: Decision Tree Concepts

  • Decision Tree:
    • Components of a Decision Tree
      • Root Node
      • Internal Nodes
      • Leaf Nodes
    • Attribute Selection Measures(ASM):
      • Entropy
      • Information Gain
      • Gini Index
    • How Decision Trees Work
    • Advantages of Decision Trees

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 35 (21 Aug 2023):

Goal: Decision Tree Implementation

  • Decision Tree Implementation Setup:
    • Data Pre-processing
    • Model Training
    • Predicting the Results
    • Model Evaluation Techniques
  • Examples for Decision Tree Implementation:
    • IRIS Flower Classification
    • Red Wine Quality Prediction

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 36 (22 Aug 2023):

Goal: Ensemble Methods

  • Ensemble Methods:
    • Bagging
    • Boosting
    • Stacking
    • Advantages of Ensemble Methods

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 37 (23 Aug 2023):

Goal: Gradient Boosting in Machine Learning

  • Gradient Boosting in Machine Learning:
    • What is Gradient Boosting
    • Key Components of Gradient Boosting
    • How Gradient Boosting Works
    • Benefits of Gradient Boosting

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 38 (24 Aug 2023):

Goal: AdaBoost and XGBoost

  • AdaBoost and XGBoost:
    • AdaBoost (Adaptive Boosting)
    • XGBoost (Extreme Gradient Boosting)
    • Advantages of AdaBoost and XGBoost
    • Applications

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 39 (25 Aug 2023):

Goal: Random Forests Introduction

  • Random Forests:
    • What are Random Forests
    • Key Components of Random Forests
    • How Random Forests Work
    • Benefits of Random Forests
    • Real-world Applications of Random Forests

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 40 (26 Aug 2023):

Goal: Random Forest Implementation and Hyperparameter Tuning

  • Random Forest Implementation:
    • Step-by-Step Approach
    • IRIS Flower Prediction
    • Red Wine Quality Prediction
  • Hyperparameter Tuning:
    • Unlocking Model Potential
    • GridSearchCV
    • RandomizedSearchCV

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 41 (27 Aug 2023):

Goal: Decision Tree and Random Forest Example

  • Decision Tree in Action
  • Enchantment of Random Forests
  • Social Media Ads prediction

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 42 (28 Aug 2023):

Goal: Support Vector Machine (SVM) Introduction

  • Introduction to SVM
  • Terminologies used in SVM
  • Advantages of SVM
  • Limitations of SVM

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 43 (29 Aug 2023):

Goal: SVM Implementation

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 44 (30 Aug 2023):

Goal: SVM Regression Implementation

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 45 (31 Aug 2023):

Goal: Introduction to KNN

  • KNN Introduction
  • Distance Metrics:
    • Euclidean Distance
    • Manhatten Distance
    • Minkowski Distance
  • How KNN works
  • How to choose value of 'K'

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 46 (1 Sept 2023):

Goal: KNN Implementation

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 47 (2 Sept 2023):

Goal: KNN Hyperparameter Tuning

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 48 (3 Sept 2023):

Goal: ML Fundamentals Revision

  • What is AI
  • What is ML
  • Machine Learning
  • Model Evaluation Techniques in ML
    • Classification: Accuracy Score, Confusion Matrix, Classification Report
    • Regression: Mean Absolute Errors,Mean Square Errors, Root Mean Square Errors
  • Exploratory Data Analysis (EDA)
  • Handling Outliers
    • Removing Outliers
    • Transforming Values

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 49 (4 Sept 2023):

Goal: 5G Resource Allocation Capstone Project - MLR, SVR and KNN Regression Models

  • Resource Allocation in 5G Network Service Project:
    • Data Pre-Processing
    • Implementation:
      • Polynomial Regression
      • SVM Regression
      • KNN Regression
    • Model Evaluation:
      1. Mean Absolute Errors
      2. Mean Square Errors
      3. Root Mean Square Errors
    • Kaggle Notebook : Link to Notebook
    • Comparison of Model Performances (Multiple Bar Charts)

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 50 (5 Sept 2023):

Goal: Capstone Project - Gender Classification - LR, DT, RF, SVM and KNN

  • Gender Classification Project:
    • Data Pre-Processing
    • Implementation:
      • Logistic Regression
      • Decision Tree
      • Random Forest
      • SVM Classification
      • KNN Classification
    • Model Evaluation:
      1. Accuracy Score
      2. Confusion Matrix
      3. Classification Report
    • Kaggle Notebook : Link to Notebook
    • Comparison of Model Performances (Bar Chart)

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 51 (6 Sept 2023):

Goal: Introduction to Cross-Validation

  • Introduction to Cross-Validation
  • What is Cross Validation
  • Why is Cross Validation Important
  • Advantages of Cross Validation
  • Limitations of Cross Validation
  • Types of Cross-Validation:
    • Leave-One-Out Cross-Validation (LOOCV)
    • Leave-P-Out Cross Validation (LPOCV)
    • K-Fold Cross-Validation
    • Stratified K-Fold Cross-Validation

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 52 (7 Sept 2023):

Goal: Cross-Validation Implementation

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 53 (8 Sept 2023):

Goal: Perform EDA Operation on Different Datasets

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 54 (9 Sept 2023):

Goal: Introduction to Dimensionality Reduction

  • The Curse of Dimensionality
  • The Importance of Dimensionality Reduction
  • Dimensionality Reduction Techniques:
    • Feature Selection
    • Feature Extraction
    • Dimension Reduction

  • Advantages of Dimensionality Reduction
  • Limitations of Dimensionality Reduction

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 55 (10 Sept 2023):

Goal: Introduction to Principal Component Analysis (PCA)

  • Some common terms used in PCA algorithm
  • Uses of PCA
  • Advantages of Principal Component Analysis
  • Limitations of Principal Component Analysis

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 56 (11 Sept 2023):

Goal: Steps in PCA (Principal Component Analysis)

  • Step 1 : Covariance Matrix Computation
  • Step 2 : Compute Eigenvalues and Eigenvectors of Covariance Matrix to Identify Principal Components

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 57 (12 Sept 2023):

Goal: Solve Example of PCA

  • Pre-processed Data
  • Calculated Covariance Matrix
  • Eigenvalues and Eigenvectors
  • Sorted Eigenvalues
  • Select Principal Components

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 58 (13 Sept 2023):

Goal: PCA Implementation using Scikit-Learn

  • Data Preparation
  • Importing Scikit-learn
  • Standardization
  • PCA Implementation
  • Explained Variance
  • Dimensionality Reduction
  • Visualization
  • Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 59 (14 Sept 2023):

Goal: Introduction to Feature Selection

  • What is Feature Selection?
  • Why is Feature Selection Necessary?
  • Techniques in Feature Selection
    • Univariate feature selection
    • Feature importance from tree-based models
    • Recursive Feature Elimination (RFE)
    • L1-based feature selection
    • Correlation-based feature selection
  • Steps in Feature Selection:
    • Data Pre-Processing
    • Feature Scoring
    • Feature Selection
  • Advantages of Feature Selection:
    • Improved model performance
    • Faster training and prediction
    • Enhanced model interpretability
    • Reduced risk of overfitting
    • Easier visualization of data
  • Limitations of Feature Selection:
    • It may result in information loss.
    • It can be challenging to decide which features to select.
    • Some methods might not work well for all types of data.

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 60 (15 Sept 2023):

Goal: Feature Selection : Filter Methods

  • Introduction to Filter Methods

  • Steps in Filter Methods:

    1. Data Pre-Processing
    2. Feature Scoring
    3. Feature Selection
  • Common Techniques in Filter Methods:

    1. Correlation-based Feature Selection
    2. Information Gain
    3. Chi-square Test
    4. Fisher's Score
    5. Missing Value Ratio
  • Advantages of Filter Methods:

    1. Simplicity
    2. Speed
    3. Independence
  • Limitations of Filter Methods:

    1. Independence
    2. Suboptimal Results
  • Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 61 (16 Sept 2023):

Goal: Feature Selection : Wrapper Methods

  • Introduction to Wrapper Methods

  • Steps in Wrapper Methods:

    1. Subset Selection
    2. Model Building
    3. Model Evaluation
  • Common Techniques in Wrapper Methods:

    1. Forward Selection Method
    2. Backward Elimination Method
    3. Exhaustive Feature Selection Method
    4. Recursive Feature Selection Method
  • Advantages of Wrapper Methods:

    1. Optimal Features
    2. Model-Specific
  • Limitations of Wrapper Methods:

    1. Computationally Intensive
    2. Model Dependency
  • Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 62 (17 Sept 2023):

Goal: Feature Selection : Wrapper Methods

  • Introduction to Embedded Methods

  • Steps in Embedded Methods:

    1. Feature Selection While Building
    2. Model Training
    3. Feature Importance Assessment
  • Common Techniques in Embedded Methods:

    1. Random Forest Importance
    2. Lasso (L1 Regularization)
    3. Ridge (L2 Regularization)
    4. Elastic Net (L1 and L2 Regularization)
  • Advantages of Embedded Methods:

    1. Feature Relevance
    2. Model Compatibility
  • Limitations of Embedded Methods:

    1. Model Dependency
    2. May Miss Correlations
  • Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 63 (18 Sept 2023):

Goal: Exploratory Data Analysis (EDA) on IPL All Time Best Batsman Trending Dataset

  • Key EDA Operations Performed:

    1. Data Loading
    2. Data Exploration
    3. Data Visualization
    4. Statistical Insights
  • Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 64 (19 Sept 2023):

Goal: Support Vector Regression (SVR) on Used Car Price Prediction

  • Key SVR Operations Performed:

    1. Data Loading
    2. Data Pre-processing
    3. Feature Selection
    4. Splitting Data
    5. SVR Model Building
    6. Model Training
    7. Model Evaluation
  • Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 65 (20 Sept 2023):

Goal: Movie Recommendations Using Collaborative Filtering

  • Key Operations Performed:

    1. Data Loading
    2. Data Pre-processing
    3. Collaborative Filtering
    4. Movie Recommendations
  • Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 66 (21 Sept 2023):

Goal: Simple Linear Regression for Insurance Predictions

  • Key Operations Performed:

    1. Data Loading
    2. Data Exploration
    3. Linear Regression Implementation
    4. Model Evaluation
  • Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 67 (22 Sept 2023):

Goal: Simple Linear Regression for Salary Predictions

  • Key Operations Performed:

    1. Data Loading
    2. Data Exploration
    3. Linear Regression Implementation
    4. Model Evaluation
  • Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 68 (23 Sept 2023):

Goal: Exploratory Data Analysis (EDA) for Gym Exercises Data

  • Key Operations Performed:

    1. Data Loading
    2. Data Exploration
    3. Data Visualization
    4. Insights Extraction
  • Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 69 (24 Sept 2023):

Goal: Exploratory Data Analysis (EDA) for Life Expectancy Data

  • Key Operations Performed:

    1. Data Loading
    2. Data Exploration
    3. Data Visualization
    4. Insights Extraction
  • Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 70 (25 Sept 2023):

Goal: Exploratory Data Analysis (EDA) on Predicting Student Dropouts

  • Key Operations Performed:

    1. Data Loading
    2. Data Exploration
    3. Data Visualization
    4. Insights Extraction
  • Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 71 (26 Sept 2023):

Goal: Introduction to Clustering in ML

  • Intro to Clustering
  • Types of Clustering:
    1. Partitioning Clustering
    2. Density-Based Clustering
    3. Distribution Model-Based Clustering
    4. Hierarchical Clustering
    5. Fuzzy Clustering

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 72 (27 Sept 2023):

Goal: Clustering Algorithms in Machine Learning

  • Commonly Used Clustering Algorithms:
    • K-means Algorithm
    • Hierarchical Clustering
    • DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
    • Agglomerative Clustering
    • Gaussian Mixture Model (GMM)
  • Applications of Clustering:
    • Customer Segmentation
    • Image Compression
    • Anomaly Detection
    • Document Classification
  • Advantages of Clustering:
    • Pattern Discovery
    • Data Reduction
    • Scalability
    • Interpretability

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 73 (28 Sept 2023):

Goal: Implementing K-means Clustering

  • K-means Clustering:
    • Initialization
    • Assignment
    • Update Centroids
    • Repeat
  • Customer Clustering : Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 74 (29 Sept 2023):

Goal: K-means Clustering Implementation

  • K-means Clustering:
    • Initialization
    • Assignment
    • Update Centroids
    • Repeat
  • Credit Card Clustering : Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 75 (30 Sept 2023):

Goal: Visualizing Clusters Distribution for 30 Datasets

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 76 (1 Oct 2023):

Goal: Hierarchical Clustering Implementation

GitHub Repository: Source Code

LinkedIn post: Daily Update


DAY 77 (2 Oct 2023):

Goal: Hierarchical Clustering Concepts

  • What Can We Achieve with Hierarchical Clustering:
    • Hierarchical Insights
    • Data Exploration
    • Decision Support

GitHub Repository: Source Code

LinkedIn post: Daily Update


About

Starting a 100 Days Code Challenge for Learning Data Science from Scratch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.7%
  • Other 0.3%