Skip to content

Code for our ECAI'24 paper "Fairness Auditing with Multi-Agent Collaboration"

Notifications You must be signed in to change notification settings

sacs-epfl/fairness-audits-with-collaboration

Repository files navigation

Fairness Auditing with Multi-Agent Collaboration

audits-full2

This repository hosts the code for our work titled "Fairness Auditing with Multi-agent Collaboration" which will appear in the Proceedings of the 27th European Conference on Artificial Intelligence (ECAI), 2024.

Setting up the python environment

Create a new conda environment using the provided environment.yml file.

conda env create -f environment.yml

Activate the environment before running any code below.

conda activate audits

Setting up the datasets

German Credit

# Run from root directory
python scripts/preprocess_german_credit.py

The script downloads the dataset, preprocesses it and saves it in the data/german_credit folder. The preprocessed dataset is saved as data/german_credit/features.csv and data/german_credit/labels.csv.

Propublica

Manually create the folder data/propublica, then download compas-scores-two-years.csv from ProPublica GitHub and save it in the data/propublica folder. Finally run the following script to complete the preprocessing.

# Run from root directory
python scripts/preprocess_propublica_dataset.py

You should now have the preprocessed dataset in the data/propublica folder as data/propublica/features.csv and data/propublica/labels.csv.

Folktables

Manually create the folder data/folktables before running the following. We will first download the dataset using the following script. Note that this download takes a while as the dataset is large. Additionally you will need a machine with a large amount of memory (150G) to run the download script.

# Run from root directory
python scripts/download_folk_tables.py

The raw data gets downloaded in the data/2018/5-year folder which is processed and saved as data/folktables/features.csv and data/folktables/labels.csv. We will now binarize the features using the following script.

# Run from root directory
python scripts/preprocess_folk_tables.py

After running the script, you should have the final preprocessed dataset as data/folktables/features_bin.csv and data/folktables/labels_bin.csv.

Analyzing datasets to generate meta files (and ensure correct preprocessing)

We require additional information about the datasets to run our main experiments. This information includes strata sizes, ground truth demographic parity, etc. While we can generate this information within each run of the main experiment, it is more efficient to generate this information once and save it for future use. Run analyze_dataset.py which will do all the above things and generate meta files data/<dataset_name>/all_probs.pkl, data/<dataset_name>/all_ys.pkland data/<dataset_name>/all_nks.pkl for each dataset.

# Run from root directory
python analyze_dataset.py

This script also prints P(X_i = 1) for each attribute in the dataset and furthermore the ground truth demographic parity i.e. P(Y = 1 | X_i = 1) - P(Y = 1 | X_i = 0) for each attribute. Tables 2, 3 and 4 in the Appendix of the paper present some of this information generated by analyze_dataset.py.

Running two agent collaboration (Figure 3)

To run the two agent collaboration experiment, run the following script:

scripts/two_agent.sh

You can modify the script to change the dataset. Please set the number of repetitions depending upon the chosen dataset as indicated in the script. These repetitions are chosen considering the size of the dataset (i.e. corresponding run time) as well as the accuracy of estimation. You must also set the attrs_to_audit variable depending upon the dataset as indicated in the script.

Running multi-agent collaboration (Figure 4 and 5)

We provide scripts for each dataset in scripts/multicolab/<dataset>_launcher.sh. The script takes 3 arguments: sampling method, collaboration strategy and number of collaborating agents $k$ (should be between >= 2 and <= 5).

# Run from root directory
scripts/multicolab/german_credit_launcher.sh stratified apriori 3

The results are saved in the results/<dataset>/multicolab folder. For each choice of $k$, there exist $5 \choose k$ possible combinations of agents. The filename reports the chosen combination of agents along with the number of agents $k$.

Evaluating observation 1 (Figure 2)

Lastly, we also provide a script for generating the data for Observation 1 in the paper. This script utilizes previously generated meta files, specifically the data/<dataset_name>/all_nks.pkl which save the sizes for each strata.

# Run from root directory
python scripts/analyze_strata.py

The resulting plot is saved as results/plots/largest_stratum.pdf.

Citation

If you found this code useful, please consider citing our paper:

@article{de2024fairness,
  title={Fairness auditing with multi-agent collaboration},
  author={de Vos, Martijn and Dhasade, Akash and Bourr{\'e}e, Jade Garcia and Kermarrec, Anne-Marie and Merrer, Erwan Le and Rottembourg, Benoit and Tredan, Gilles},
  journal={arXiv preprint arXiv:2402.08522},
  year={2024}
}

About

Code for our ECAI'24 paper "Fairness Auditing with Multi-Agent Collaboration"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published