The challenge lies in leveraging Yelp data to gain insights into factors that contribute to the success of restaurants and predicting the success of new or existing restaurants based on various features and attributes.
We ensure that we have the following requirements on the machine
- Python 3.8+
- pip (Python package installer)
This project utilizes Yelp's extensive dataset to explore factors that influence the success of businesses. We aimed to develop a predictive model that can forecast the potential success of new and existing businesses based on various attributes available in the data. This model serves to aid business owners, investors, and analysts in making informed decisions.
- Clone the repository:
git clone https://github.com/ashrithagoramane/INFO7390.git
- Create a virtual environment and source it
python3 -m venv .venv source .venv/bin/activate
- Install the required packages (Optional)
pip install -r requirements.txt
- Data Preprocessing: Handles missing values, data standardization, and outlier detection.
- Exploratory Data Analysis (EDA): Visual and statistical analysis to identify patterns and anomalies.
- Feature Engineering: Enhances features to improve model accuracy.
- Model Training: Includes Linear Regression, Decision Trees, Random Forest, Gradient Boosting, and XGBoost.
- Performance Evaluation: Uses metrics to evaluate and compare model performance.
- https://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html
- https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regression.html
- https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regression.html
- https://xgboost.readthedocs.io/en/stable/parameter.html
- https://www.geeksforgeeks.org/xgboost-for-regression/