Neither Private Nor Fair: Impact of Data Imbalance on Utility and Fairness in Differential Privacy (CCS'20 Privacy-Preserving ML in Practice Workshop)
The paper discusses how Differential Privacy (specifically DPSGD from [1]) impacts model performance for underrepresented groups. We aim to study how different levels of imbalance in the data affect the accuracy and the fairness of the decisions made by the model, given different levels of privacy. We demonstrate how even small imbalances and loose privacy guarantees can cause disparate impacts.
Configure environment by running: pip install -r requirements.txt
We use Python3.7 and GPU Nvidia TitanX.
File playing.py
serves as the entry point for the code. It uses utils/params.yaml
to set parameters from the paper and builds a graph on Tensorboard.
For Sentiment prediction we use playing_nlp.py
.
- MNIST (part of PyTorch)
- Diversity in Faces (obtained from IBM here)
- iNaturalist (download from here)
- UTKFace (from here)
- AAE Twitter corpus (from here)
We use compute_dp_sgd_privacy.py
copied from public repo.
DP-FedAvg implementation is taken from public repo.
Implementation of DPSGD is based on TF Privacy repo and papers:
https://arxiv.org/pdf/2009.06389.pdf
@article{farrand2020neither, title={Neither Private Nor Fair: Impact of Data Imbalance on Utility and Fairness in Differential Privacy}, author={Farrand, Tom and Mireshghallah, Fatemehsadat and Singh, Sahib and Trask, Andrew}, journal={arXiv preprint arXiv:2009.06389}, year={2020} }
=======