Skip to content

A full data analytics project that Review of Big Data Analytic Methods and cluster the Household Income using the K-means algorithm.

Notifications You must be signed in to change notification settings

KAN-Team/Household_Income_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Household_Income_Analysis

back

A full data analytics project that Review of Big Data Analytic Methods and cluster the Household Income using the K-means algorithm.

Features

  • Preprocessing:

  • [1] Analyzing the data.

  • [2] Detecting the duplicate rows of data and removing them.

  • [3] Detecting the outlier values of data and removing them.

  • [4] Visualizing the scatter and box plots.

  • Advanced Analytics Methods:

  • [1] Applying K-means algorithm in the data.

  • [2] Determine the best value of K using the elbow plot.

  • [3] Scale the data by Log10 to improve the clustering result.

Visualization/Plots

Data Analysis

  • Scatter plot
    scatterplot

  • Scaled box plot
    log_boxplot

Data Clustering

  • Clustering with K = 10
    clusters_log10_plot

  • elbow plot
    wss_without_outlier_plot

  • Clustering with The best choice of K = 2
    clusters_best_k_plot

Prerequisites

  • [1] Install R programming language.
  • [2] Install R Studio.
  • [3] Install the needed packages.

Packages

install.packages("dplyr")
install.packages("data.table")
install.packages("Hmisc")
install.packages("ggpubr")
install.packages("factoextra")
install.packages("ClusterR")
install.packages("cluster")

References

About

A full data analytics project that Review of Big Data Analytic Methods and cluster the Household Income using the K-means algorithm.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages