Skip to content

[EMNLP 2023] Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models

License

Notifications You must be signed in to change notification settings

coastalcph/gender-neutral-vl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models

This is the code to replicate the experiments described in the paper (to appear in EMNLP23):

Laura Cabello, Emanuele Bugliarello, Stephanie Brandl and Desmond Elliott. Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing.

Repository Setup

You can clone this repository issuing:
git clone [email protected]:coastalcph/gender-neutral-vl.git

1. Create a fresh conda environment and install all dependencies.

conda create -n genvlm python=3.9
conda activate genvlm
pip install -r requirements.txt

2. Install PyTorch

conda install pytorch=1.12.0=py3.9_cuda11.3_cudnn8.3.2_0 torchvision=0.13.0=py39_cu113 cudatoolkit=11.3 -c pytorch

Following steps are required in order to run code from VOLTA:

3. Install apex. If you use a cluster, you may want to first run commands like the following:

module load cuda/10.1.105
module load gcc/8.3.0-cuda

4. Setup the refer submodule for Referring Expression Comprehension:

cd src/LXMERT/volta/tools/refer; make

5. Install this codebase as a package in this environment.

python setup.py develop

Repository Config

The main configuration needed to run the scripts in the experiments/ folder is stored in main.config. Please, edit this file at your own convenience.

Data

You can download the preprocessed gender-neutral data from here. These data files are used for continued pretraining on gender-neutral data.

Details on the method used to generate this data can be found in the paper. The mappings between gendered words and neutral words is in Mappings.csv (and in Appendix A). The code to reproduce our preprocessing pipeline or apply it to your own data is stored in src/preprocessing/. Scripts to run the code are in experiments/preprocessing/.

Lists of common nouns that co-occur with gender entities in the corresponding training data are stored in src/preprocessing/top_objects/. The top-N objects are used to evaluate bias amplification (N=100 to measure bias in pretraining, N=50 to measure bias in downstream tasks). See Section 4.2 and Section 5.3 for details.

* Note that we use the same COCO train split used for pretraining LXMERT, which is different from the original COCO train split or the Karpathy split.

* Note that our CC3M files map captions to image ids obtained from filenames as done in VOLTA.

Models

Our pretrained models can be downloaded from here, where third_party/ contains the original weights, while Pretrain_{CC3M,COCO}_neutral/ contain the weights after continued pretraining on gender-neutral data.

Model configuration depend on the model family. Files are stored in:

Training and Evaluation

We provide bash scripts to train (i.e. continued pretraining or fine-tuning) and evaluate models in experiments/. These include the following models, as specified in our experimental setup (Section 5):

Alt text

Task configuration files are stored in:

Code to plot results is shared in Jupyter Notebooks in notebooks/.

License

This work is licensed under the MIT license. See LICENSE for details. Third-party software and data sets are subject to their respective licenses.
If you find our code/data/models or ideas useful in your research, please consider citing the paper:

@inproceedings{cabello-etal-2023-evaluating,
    title = "Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models",
    author = "Cabello, Laura  and
      Bugliarello, Emanuele   and
      Brandl, Stephanie  and
      Elliott, Desmond",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2023",
    address = "Singapore, Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2310.17530",
}

Acknowledgement

Our codebase heavily relies on these excellent repositories:

About

[EMNLP 2023] Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published