This repository hosts the code for our work titled "Fairness Auditing with Multi-agent Collaboration" which will appear in the Proceedings of the 27th European Conference on Artificial Intelligence (ECAI), 2024.
Create a new conda environment using the provided environment.yml
file.
conda env create -f environment.yml
Activate the environment before running any code below.
conda activate audits
# Run from root directory
python scripts/preprocess_german_credit.py
The script downloads the dataset, preprocesses it and saves it in the data/german_credit
folder. The preprocessed dataset is saved as data/german_credit/features.csv
and data/german_credit/labels.csv
.
Manually create the folder data/propublica
, then download compas-scores-two-years.csv
from ProPublica GitHub and save it in the data/propublica
folder. Finally run the following script to complete the preprocessing.
# Run from root directory
python scripts/preprocess_propublica_dataset.py
You should now have the preprocessed dataset in the data/propublica
folder as data/propublica/features.csv
and data/propublica/labels.csv
.
Manually create the folder data/folktables
before running the following. We will first download the dataset using the following script. Note that this download takes a while as the dataset is large. Additionally you will need a machine with a large amount of memory (150G) to run the download script.
# Run from root directory
python scripts/download_folk_tables.py
The raw data gets downloaded in the data/2018/5-year
folder which is processed and saved as data/folktables/features.csv
and data/folktables/labels.csv
. We will now binarize the features using the following script.
# Run from root directory
python scripts/preprocess_folk_tables.py
After running the script, you should have the final preprocessed dataset as data/folktables/features_bin.csv
and data/folktables/labels_bin.csv
.
We require additional information about the datasets to run our main experiments.
This information includes strata sizes, ground truth demographic parity, etc.
While we can generate this information within each run of the main experiment, it is more efficient to generate this information once and save it for future use.
Run analyze_dataset.py
which will do all the above things and generate meta files data/<dataset_name>/all_probs.pkl
, data/<dataset_name>/all_ys.pkl
and data/<dataset_name>/all_nks.pkl
for each dataset.
# Run from root directory
python analyze_dataset.py
This script also prints P(X_i = 1) for each attribute in the dataset and furthermore the ground truth demographic parity i.e. P(Y = 1 | X_i = 1) - P(Y = 1 | X_i = 0) for each attribute. Tables 2, 3 and 4 in the Appendix of the paper present some of this information generated by analyze_dataset.py
.
To run the two agent collaboration experiment, run the following script:
scripts/two_agent.sh
You can modify the script to change the dataset
. Please set the number of repetitions
depending upon the chosen dataset as indicated in the script. These repetitions are chosen considering the size of the dataset (i.e. corresponding run time) as well as the accuracy of estimation. You must also set the attrs_to_audit
variable depending upon the dataset as indicated in the script.
We provide scripts for each dataset in scripts/multicolab/<dataset>_launcher.sh
. The script takes 3 arguments: sampling method, collaboration strategy and number of collaborating agents
# Run from root directory
scripts/multicolab/german_credit_launcher.sh stratified apriori 3
The results are saved in the results/<dataset>/multicolab
folder. For each choice of
Lastly, we also provide a script for generating the data for Observation 1 in the paper. This script utilizes previously generated meta files, specifically the data/<dataset_name>/all_nks.pkl
which save the sizes for each strata.
# Run from root directory
python scripts/analyze_strata.py
The resulting plot is saved as results/plots/largest_stratum.pdf
.
If you found this code useful, please consider citing our paper:
@article{de2024fairness,
title={Fairness auditing with multi-agent collaboration},
author={de Vos, Martijn and Dhasade, Akash and Bourr{\'e}e, Jade Garcia and Kermarrec, Anne-Marie and Merrer, Erwan Le and Rottembourg, Benoit and Tredan, Gilles},
journal={arXiv preprint arXiv:2402.08522},
year={2024}
}