An Empirical Study of Memorization in NLP

ACL 2022

Installing / Getting started

docker run -it --gpus all --name <docker_name> --ipc=host -v <project_path>:/opt/codes nvcr.io/nvidia/pytorch:20.02-py3 bash
pip install torch==1.2.0
pip install transformers==3.0.2
jupyter notebook --notebook-dir=/opt/codes --ip=0.0.0.0 --no-browser --allow-root

Prepare the datasets

Download the CIFAR-10, SNLI, SST, Yahoo! Answer datasets from web and then process them using the 00_EDA.ipynb

Run the experiments

git clone https://github.com/xszheng2020/memorization.git
cd cifar
bash ./scripts/run_if_attr_42.sh # compute the memorization scores and memorization attributions
bash ./scripts/run_mem_<X>.sh # train the model while dropping top-X% memorized instances
bash ./scripts/run_random_<X>.sh # train the model while dropping X% instances randomly
bash ./scripts/eval_attr_mem.sh # eval the memorization attributions
bash ./scripts/eval_attr_random.sh # eval the random attributions

Analyze the results

How to analyze the results and plot the most figures in the paper can be found in the jupyter notebooks.

Links

Paper: https://arxiv.org/abs/2203.12171
Related projects:
- fast-influence-functions: https://github.com/salesforce/fast-influence-functions

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
cifar		cifar
snli		snli
sst		sst
yahoo		yahoo
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An Empirical Study of Memorization in NLP

Installing / Getting started

Prepare the datasets

Run the experiments

Analyze the results

Links

About

Releases

Packages

Languages

License

xszheng2020/memorization

Folders and files

Latest commit

History

Repository files navigation

An Empirical Study of Memorization in NLP

Installing / Getting started

Prepare the datasets

Run the experiments

Analyze the results

Links

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages