Skip to content

Collaborative project with Huzaifa Mir on a final project for ECON 271: Business Analytics at Beloit College

Notifications You must be signed in to change notification settings

lanvymai/Youtube-Videos-View

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 

Repository files navigation

Exploring Factors Influencing Youtube Views

This project aims to analyze various factors that influence the number of views on YouTube videos using data analysis and statistical techniques.

Tools Used

  • Python
  • GitHub
  • MacOS Terminal
  • Multivariate Regression
  • Ordinary Least Square

Project Overview

This data science project explores the relationships between different variables and YouTube video views. The analysis includes handling missing values, correlation analysis, and various visualizations to understand the factors affecting video popularity.

The dataset

In this project, I used Kaggle Data Science Youtube Video Meta Data collected by The ML PhD Student

Overview of the dataset

  • 44,261 Youtube videos, 60 channels
  • Time series data from 2006-2020
  • 21 variables in total
  • Data available for Channels, View, Comments, Duration in seconds, Likes & Dislikes, Video Quality, and Caption, etc.
  • Four unique categories of interest

Key Visualizations

  1. Correlation matrix

correlation matrix

  1. Linear model plots of likes to views for selected video categories

LM plot likes to view (4 categories of interest)

  1. Linear model plots of comments to views for selected video categories

LM plot of comment to view

  1. Relational plots of views, comments, and video quality for selected categories

relational plot of view comment quality

  1. Relational plots of views, likes, and video caption for selected categories

relplot of view like caption

  1. Count of video categories in the dataset

count of categories

  1. View distribution across all video categories

view dist

  1. Distribution of views, comments, likes, and duration variables

comment like duration dist

  1. Distribution of log-transformed views, comments, likes, and duration variables

log transform

Data Preprocessing

The project involved handling missing values in the dataset.

missing value

After imputing missing values:

after handling missing value

Analysis Highlights

  • Exploration of relationships between likes, comments, and views for different video categories
  • Investigation of the impact of video quality and captions on view counts
  • Examination of view distribution across various video categories
  • Analysis of the distribution of key variables (views, comments, likes, duration) and their log-transformed versions

Results

The project provides insights into the factors that influence YouTube video views through various statistical analyses and visualizations.

  • Having a video in standard quality will increase views.
  • Videos with low view have captions for videos will decrease views.
  • Educational videos attract the most views in our model
  • Shorter videos tend to attract more views
  • Percentage increases in likes will increase views
  • Less comments correlated with videos having more views

Future Work

Limitation of dataset

  • Missing values
  • Missing definitions of categories
  • Multicollinearity and dislikes
  • Overrepresentation of categories
  • Overrepresentation of quality
  • Few number of channels

Potential areas for future exploration

  • Deeper analysis of specific video categories
  • Incorporation of additional variables that may affect view counts
  • Time-series analysis of view trends

Contributing

Contributions to this project are welcome. Please feel free to fork the repository, make changes, and submit pull requests.

Acknowledgments

This project was completed as the final project for ECON 275: Business Analytics Beloit College 2021. Special thanks to the course instructors Prof Disha Shende.

For more detailed information and to view the full presentation, visit presentation

About

Collaborative project with Huzaifa Mir on a final project for ECON 271: Business Analytics at Beloit College

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published