Skip to content

leonardopicchiami/kaggle_house_price_prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KAGGLE HOUSE SALES PRICE PREDICTION

This is a Python program about Kaggle's competition: House Prices: Advanced Regression Techniques. The competition is available on URL: https://www.kaggle.com/c/house-prices-advanced-regression-techniques

I have participated in the Machine Learning competition on the Kaggle web platform because it is the assignment as the final project on the Foundation of Data Science in the Computer Science Master's Degree course at Sapienza, University of Rome.

Kaggle Competition

Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.

The goal of the competition is to predict the sales prices for each house. Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price.

The program score is 0.11297 - top 3% on the competition's leaderboard.

Description and Requirements

Both for the optimisation phase of features (features engineering and data tying) and the application of regression techniques phase is used Python 3 by exploiting an object-oriented approach. This software was developed on the Linux Mint distro, and to execute the python script, you need to the terminal command inside the src folder:

python house_sales_prediction.py 

The program reads the test and, train dataset from the dataset folder and stores the prediction result in the result_dataset directory.

Moreover, to run the program, you need the following Python 3 libraries:

  • pandas
  • numpy
  • scipy
  • matplotlib
  • seaborn
  • sklearn

License

The license for this software is: GPLv3