Skip to content

This repo implements the experiment described in "Unpacking the Resilience of SNLI Contradiction Examples to Attacks"

Notifications You must be signed in to change notification settings

ckvermaAI/SNLI-Attack-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SNLI-Attack-Analysis

Setup for starters_code

  • Clone this repo and switch the working directory
git clone https://github.com/ckvermaAI/NLP_project.git
cd ./starters_code
  • Install the requirements
pip install -r requirements.txt
  • Run the model
# Training (checkpoints will be saved under output_dir)
python3 run.py --do_train --task nli --dataset snli --output_dir ./trained_model/

# Evaluation
python3 run.py --do_eval --task nli --dataset snli --model ./trained_model/ --output_dir ./eval_output/

Setup for universal_triggers

  • Clone this repo and install requirements for universal_triggers
git clone https://github.com/ckvermaAI/allennlp-fork.git
pip install allennlp-fork/
  • Run the script
python universal_triggers/triggers.py

Generating the triggers

# Use the hotflip attack to generate universal triggers

# 1) Attack on entailment class, to flip the label to contradiction
python triggers.py --label_filter entailment --target_label 1 2>&1 | tee hotflip/entailment-contradiction.log
# 2) Attack on entailment class, to flip the label to neutral
python triggers.py --label_filter entailment --target_label 2 2>&1 | tee hotflip/entailment-neutral.log

# 3) Attack on contradiction class, to flip the label to entailment
python triggers.py --label_filter contradiction --target_label 0  2>&1 | tee hotflip/contradiction-entailment.log
# 4) Attack on contradiction class, to flip the label to neutral
python triggers.py --label_filter contradiction --target_label 2 2>&1 | tee hotflip/contradiction-neutral.log

# 5) Attack on neutral class, to flip the label to entailment
python triggers.py --label_filter neutral --target_label 0 2>&1 | tee hotflip/neutral-entailment.log
# 6) Attack on neutral class, to flip the label to contradiction
python triggers.py --label_filter neutral --target_label 1 2>&1 | tee hotflip/neutral-contradiction.log

Creating the dataset with generated triggers

  1. Use the ./universal_triggers/build_dataset.py to generate the dataset
  2. Update the inputs to create_dataset function as required and run the scrips

Source code

Starters code

Pulled from https://github.com/gregdurrett/fp-dataset-artifacts

Universal triggers

Pulled from https://github.com/Eric-Wallace/universal-triggers/

About

This repo implements the experiment described in "Unpacking the Resilience of SNLI Contradiction Examples to Attacks"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published