Home Credit - Credit Risk Model Stability

This repository contains code and resources for predicting loan default risk while accounting for model stability over time. The project utilizes data from a Kaggle competition hosted by Home Credit, focusing on developing predictive models that are both accurate and demonstrate consistent performance across different time periods.

Project Overview

The challenge of predicting loan default risk for individuals with limited credit history poses a significant barrier to financial inclusion. This project aims to address this problem by exploring various data preprocessing techniques, model architectures, and hyperparameter tuning strategies to develop reliable credit risk models.

The experimental design follows a three-phase approach:

Phase 1: Establish baseline models using complete datasets with few missing values.
Phase 2: Expand the feature set by incorporating additional data sources and perform feature engineering.
Phase 3: Refine model performance through hyperparameter tuning, balancing accuracy (AUC) and temporal stability.

Dataset

The project utilizes a dataset provided by Home Credit, containing information on 1,526,659 case IDs ranging from loans taken between 2018-2020. The data includes various features related to individuals' credit histories, previous loan applications, demographic information, and other relevant factors.

Results

The final phase culminated in a tuned LightGBM model emerging as the top performer, achieving an impressive AUC of 0.877 and a stability score of 0.743, significantly outperforming the Logistic Regression and other baseline model. The stability score on the test set was 0.56.

Getting Started

To get started with this project, follow these steps:

Clone the repository: git clone https://github.com/nogibjj/ML_final_proj_pandas.git
Install the required dependencies: pip install -r requirements.txt
Download the dataset from the Kaggle competition: Home Credit Default Risk - The data was quite large (even for git lfs) so not pushed in this repository.
Place the dataset files in the data/ directory.
Run the Jupyter notebooks or Python scripts to preprocess the data, train the models, and evaluate their performance. Feel free to use other

Contributing

Contributions to this project are welcome! If you have any suggestions, bug reports, or improvements, please open an issue or submit a pull request. Or contact me on [email protected].

License

This project is licensed under the MIT License.

Acknowledgments

Home Credit for providing the dataset and hosting the Kaggle competition.
Kaggle for hosting the competition and providing a platform for data science challenges. We also took help from discussion forums in the competition for a few helper functions and data processing tasks.

Team Members

Meixiang Du
Faraz Jawed
Divya Sharma
Adler Viton

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Images		Images
Step_1_starter_baseline		Step_1_starter_baseline
Step_2_more_tables_fe		Step_2_more_tables_fe
Step_3_models_hyperparameter_tuning		Step_3_models_hyperparameter_tuning
.gitignore		.gitignore
README.md		README.md
submission_test.ipynb		submission_test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Home Credit - Credit Risk Model Stability

Project Overview

Dataset

Results

Getting Started

Contributing

License

Acknowledgments

Team Members

About

Releases

Packages

Contributors 3

Languages

nogibjj/ML_final_proj_pandas

Folders and files

Latest commit

History

Repository files navigation

Home Credit - Credit Risk Model Stability

Project Overview

Dataset

Results

Getting Started

Contributing

License

Acknowledgments

Team Members

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages