Skip to content

AgnivaC/SubsampledLogisticRegression

Repository files navigation

[AAAI 2024] A Provably Accurate Randomized Sampling Algorithm for Logistic Regression

Code repository for the paper:

Agniva Chowdhury and Pradeep Ramuhalli. A Provably Accurate Randomized Sampling Algorithm for Logistic Regression. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, 2024.

Technical Appendix

Technical Appendix of the paper can be found in TechnicalAppendix.pdf.

Datasets

  1. Cardiovascular disease dataset (cardio): cardio_train.csv (sourced from here)
  2. Bank customer churn prediction dataset (churn): Bank Customer Churn Prediction.csv (sourced from here)
  3. Default of credit card clients dataset (default): default of credit card clients.csv (sourced from here)

Codes

  1. To compute row leverage scores of a matrix: leverage_scores.py
  2. To perform leverage score, l2s, or uniform sampling: row_sampling.py

The code for l2s sampling has been sourced from here.

Notebooks

To reproduce the experiments in the paper, run the following Jupyter Notebooks:

  1. For Cardiovascular disease dataset: cardio_train.ipynb
  2. For Bank customer churn prediction dataset: default_of_credit_card_clients.ipynb
  3. For Default of credit card clients dataset: Bank_Customer_Churn_Prediction.ipynb

Citation

@article{Chowdhury_Ramuhalli_2024,
  title={A Provably Accurate Randomized Sampling Algorithm for Logistic Regression},
  author={Chowdhury, Agniva and Ramuhalli, Pradeep},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={10},
  year={2024},
  pages={11597-11605},
  url={https://ojs.aaai.org/index.php/AAAI/article/view/29042},
  doi={10.1609/aaai.v38i10.29042}
}

Please contact Agniva Chowdhury for questions or comments.