Code repository for the paper:
Agniva Chowdhury and Pradeep Ramuhalli. A Provably Accurate Randomized Sampling Algorithm for Logistic Regression. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, 2024.
Technical Appendix of the paper can be found in TechnicalAppendix.pdf.
- Cardiovascular disease dataset (cardio): cardio_train.csv (sourced from here)
- Bank customer churn prediction dataset (churn): Bank Customer Churn Prediction.csv (sourced from here)
- Default of credit card clients dataset (default): default of credit card clients.csv (sourced from here)
- To compute row leverage scores of a matrix: leverage_scores.py
- To perform leverage score, l2s, or uniform sampling: row_sampling.py
The code for l2s sampling has been sourced from here.
To reproduce the experiments in the paper, run the following Jupyter Notebooks:
- For Cardiovascular disease dataset: cardio_train.ipynb
- For Bank customer churn prediction dataset: default_of_credit_card_clients.ipynb
- For Default of credit card clients dataset: Bank_Customer_Churn_Prediction.ipynb
@article{Chowdhury_Ramuhalli_2024,
title={A Provably Accurate Randomized Sampling Algorithm for Logistic Regression},
author={Chowdhury, Agniva and Ramuhalli, Pradeep},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={10},
year={2024},
pages={11597-11605},
url={https://ojs.aaai.org/index.php/AAAI/article/view/29042},
doi={10.1609/aaai.v38i10.29042}
}
Please contact Agniva Chowdhury for questions or comments.