This repository is part of the Kaggle competition "House Prices - Advanced Regression Techniques". Here, we employ sophisticated regression techniques to predict house prices with high accuracy, achieving a Public Leaderboard Score of 0.12076.
This project encompasses a variety of machine learning techniques aimed at forecasting house prices. From rigorous data preparation to advanced modeling, we focus on extracting the most predictive features and deploying robust regression models.
- Data Cleaning: Imputation of missing values and initial data exploration.
- Feature Engineering:
- Simplification: Reducing complexity in existing features to enhance model interpretability.
- Combination: Merging multiple features to generate powerful predictors.
- Polynomial Features: Expanding the top 10 features into polynomial space for capturing non-linear effects.
- Boolean Features: Introducing binary variables to encapsulate critical thresholds.
- Data Transformation:
- Skewness Adjustment: Applying transformations to normalize data distribution, thus improving model accuracy.
- Model Development:
- Cross-Validation: Utilizing cross-validation techniques to fine-tune the Lasso and XGBRegressor models, ensuring robustness and generalization.
- Residual Analysis: Analyzing residuals to better understand model performance and guide the ensemble strategy.
- Ensemble Methods: Combining predictions from various models to improve accuracy and stability of final predictions.
- Python
- Scikit-Learn
- XGBoost
- Pandas, NumPy
To replicate the findings and experiment with the models:
- Clone this repository.
- Download the dataset from Kaggle.
- Install required Python packages:
pip install -r requirements.txt
. - Run the Jupyter notebooks provided in the repository.
This project is open-sourced under the MIT License. See the LICENSE file for more details.