This project focuses on a binary credit scoring using Exploratory Data Analysis (EDA) and predictive modeling. The goal is to analyze customer credit data, extract meaningful insights, and build a robust model to predict creditworthiness.
📁 credit_scoring_project
│── 📄 README.md # Project documentation
│── 📂 data # Raw and processed datasets
│── 📂 notebooks # Jupyter Notebooks for EDA & modeling
│── 📂 __pycache__ # Cached Python files
│── 📂 .vscode # VS Code configuration
│── 📄 .dockerignore # Docker ignore file
│── 📄 .gitignore # Git ignore file
│── 📄 Dockerfile # Docker setup
│── 📄 app.py # Application script
│── 📄 requirements.txt # Dependencies
The EDA phase includes:
- Data Cleaning: Handling missing values, outlier detection, and feature engineering.
- Statistical Summary: Descriptive statistics and data distribution analysis.
- Feature Relationships: Correlations, visualizations, and trend analysis.
- Target Variable Insights: Understanding factors influencing credit risk.
- Feature Selection & Engineering
- Baseline Models: Logistic Regression, Decision Trees
- Advanced Models: Random Forest, Logistic Regression, Gradient Boosting
- Hyperparameter Tuning & Optimization
- Model Evaluation: AUC-ROC, Precision-Recall, Confusion Matrix
Ensure you have Python 3.8+ and install dependencies:
pip install -r requirements.txt
jupyter notebook notebooks/eda.ipynb
python app.py
- Key features affecting credit risk.
- Model performance comparison.
- Business implications of predictions.
- Incorporating alternative data sources
- Explainability with SHAP/LIME
Contributions are welcome! Feel free to submit a PR or open an issue.