This project aims to predict car prices using the various features of a car. A comprehensive study on the dataset is done and shown in the Jupyter notebook in this project.
This project is to highlight EDA (Exploratory Data Analysis) and its importance.
The dataset can be found in the 'Data' folder along with both training data and testing data, if you prefer data to be segregated.
Being a novice data scientist, I referred an amazing Jupyter notebook for this project. The link is down below:
Car Price Prediction(Linear Regression - RFE)
Linear regressor from the sklearn library was used as the model.
Model uses RFE (Recursive Feature Elimination) to eliminate features with relatively less correlation with the dependent variable price. Moreover features with high VIF (Variance Inflation Factor) were eliminated to remove noise.
If you want to follow this notebook and use all the features it includes, it is recommended to setup a virtual invironment and installing all the libraries by folling the below steps.
* The following steps assume that the project has already been pulled or atleast the requirements.txt *
-
Installing the virtualenv library :
pip install virtualenv -
Make a new project folder and after navigating to that folder using terminal, type the following command to initialize a python virtual environment :
python<version> -m venv <virtual-environment-name> -
Now activate the environment using the following command :
source <environment-name>/bin/activate -
Once the environment is up and running, all the required libraries can be installed using the following command :
pip install -r requirements.txt -
If the environment is to be deactivated type: deactivate
All the above commands were run in Bash shell