This project aims to analyze various factors that influence the number of views on YouTube videos using data analysis and statistical techniques.
- Python
- GitHub
- MacOS Terminal
- Multivariate Regression
- Ordinary Least Square
This data science project explores the relationships between different variables and YouTube video views. The analysis includes handling missing values, correlation analysis, and various visualizations to understand the factors affecting video popularity.
In this project, I used Kaggle Data Science Youtube Video Meta Data collected by The ML PhD Student
- 44,261 Youtube videos, 60 channels
- Time series data from 2006-2020
- 21 variables in total
- Data available for Channels, View, Comments, Duration in seconds, Likes & Dislikes, Video Quality, and Caption, etc.
- Four unique categories of interest
- Correlation matrix
- Linear model plots of likes to views for selected video categories
- Linear model plots of comments to views for selected video categories
- Relational plots of views, comments, and video quality for selected categories
- Relational plots of views, likes, and video caption for selected categories
- Count of video categories in the dataset
- View distribution across all video categories
- Distribution of views, comments, likes, and duration variables
- Distribution of log-transformed views, comments, likes, and duration variables
The project involved handling missing values in the dataset.
After imputing missing values:
- Exploration of relationships between likes, comments, and views for different video categories
- Investigation of the impact of video quality and captions on view counts
- Examination of view distribution across various video categories
- Analysis of the distribution of key variables (views, comments, likes, duration) and their log-transformed versions
The project provides insights into the factors that influence YouTube video views through various statistical analyses and visualizations.
- Having a video in standard quality will increase views.
- Videos with low view have captions for videos will decrease views.
- Educational videos attract the most views in our model
- Shorter videos tend to attract more views
- Percentage increases in likes will increase views
- Less comments correlated with videos having more views
- Missing values
- Missing definitions of categories
- Multicollinearity and dislikes
- Overrepresentation of categories
- Overrepresentation of quality
- Few number of channels
- Deeper analysis of specific video categories
- Incorporation of additional variables that may affect view counts
- Time-series analysis of view trends
Contributions to this project are welcome. Please feel free to fork the repository, make changes, and submit pull requests.
This project was completed as the final project for ECON 275: Business Analytics Beloit College 2021. Special thanks to the course instructors Prof Disha Shende.
For more detailed information and to view the full presentation, visit presentation