ACL 2022
docker run -it --gpus all --name <docker_name> --ipc=host -v <project_path>:/opt/codes nvcr.io/nvidia/pytorch:20.02-py3 bash
pip install torch==1.2.0
pip install transformers==3.0.2
jupyter notebook --notebook-dir=/opt/codes --ip=0.0.0.0 --no-browser --allow-root
Download the CIFAR-10, SNLI, SST, Yahoo! Answer datasets from web and then process them using the 00_EDA.ipynb
git clone https://github.com/xszheng2020/memorization.git
cd cifar
bash ./scripts/run_if_attr_42.sh # compute the memorization scores and memorization attributions
bash ./scripts/run_mem_<X>.sh # train the model while dropping top-X% memorized instances
bash ./scripts/run_random_<X>.sh # train the model while dropping X% instances randomly
bash ./scripts/eval_attr_mem.sh # eval the memorization attributions
bash ./scripts/eval_attr_random.sh # eval the random attributions
How to analyze the results and plot the most figures in the paper can be found in the jupyter notebooks.
- Paper: https://arxiv.org/abs/2203.12171
- Related projects:
- fast-influence-functions: https://github.com/salesforce/fast-influence-functions