RobustLR

RobustLR: A Diagnostic Benchmark for Evaluating Logical Robustness of Deductive Reasoners

Overview of RobustLR

We expect a strong deductive reasoning model should be robust to logical variations in the input. Here, the model fails to understand the logical conjunction in second example and predicts the wrong entailment of the statement.

Dependencies

Dependencies can be installed using requirements.txt.

Pipeline for running code

First, download the data from this link in a folder named data.
Next, tokenize the data to be used for training a model
Then, train a model using the tokenized data, which saves the checkpoint in a folder at saved
Lastly, evaluate the checkpoint on RobustLR consisting of 3 different types of contrast sets and 3 different types of equivalent sets.

Below, we show a step-by-step pipleline to train and evaluate a RoBERTa checkpoint. The example is shown for the `all` training dataset that contains all the logical operators at train time [AND, OR, NOT].

Process data to generate tokenized dataset

python process_dataset.py --dataset train_data/all --arch roberta_large_race

Finetune RoBERTa checkpoint

We use a RoBERTa checkpoint finetuned on RACE. The model can be changed via config at src/configs/config.yaml

python main.py --dataset all --train_dataset all --dev_dataset all --test_dataset all

Evaluation on RobustLR diagnostic benchmark

Replace the <model_ckpt> in below command to the saved checkpoint path from model finetuning done above.

Conjunction Contrast Set

python process_dataset.py --dataset robustlr/logical_contrast/conj_contrast_with_distractors --eval

python main.py --override evaluate --dataset conj_contrast_with_distractors --train_dataset conj_contrast_with_distractors --dev_dataset conj_contrast_with_distractors --test_dataset conj_contrast_with_distractors --ckpt_path <model_ckpt>

Disjunction Contrast Set

python process_dataset.py --dataset robustlr/logical_contrast/disj_contrast_with_distractors --eval

python main.py --override evaluate --dataset disj_contrast_with_distractors --train_dataset disj_contrast_with_distractors --dev_dataset disj_contrast_with_distractors --test_dataset disj_contrast_with_distractors --ckpt_path <model_ckpt>

Negation Contrast Set

python process_dataset.py --dataset robustlr/logical_contrast/neg_contrast_with_distractors --eval

python main.py --override evaluate --dataset neg_contrast_with_distractors --train_dataset neg_contrast_with_distractors --dev_dataset neg_contrast_with_distractors --test_dataset neg_contrast_with_distractors --ckpt_path <model_ckpt>

Contrapositive Equivalence Set

python process_dataset.py --dataset robustlr/logical_equivalence/contrapositive_equiv --eval

python main.py --override evaluate --dataset contrapositive_equiv --train_dataset contrapositive_equiv --dev_dataset contrapositive_equiv --test_dataset contrapositive_equiv --ckpt_path <model_ckpt>

Distributive 1 Equivalence Set

python process_dataset.py --dataset robustlr/logical_equivalence/distributive1_equiv --eval

python main.py --override evaluate --dataset distributive1_equiv --train_dataset distributive1_equiv --dev_dataset distributive1_equiv --test_dataset distributive1_equiv --ckpt_path <model_ckpt>

Distributive 2 Equivalence Set

python process_dataset.py --dataset robustlr/logical_equivalence/distributive2_equiv --eval

python main.py --override evaluate --dataset distributive2_equiv --train_dataset distributive2_equiv --dev_dataset distributive2_equiv --test_dataset distributive2_equiv --ckpt_path <model_ckpt>

For any clarification, comments, or suggestions please create an issue or contact Soumya.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
LICENSE		LICENSE
motivation.png		motivation.png
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RobustLR

RobustLR: A Diagnostic Benchmark for Evaluating Logical Robustness of Deductive Reasoners

Overview of RobustLR

Dependencies

Pipeline for running code

Process data to generate tokenized dataset

Finetune RoBERTa checkpoint

Evaluation on RobustLR diagnostic benchmark

Conjunction Contrast Set

Disjunction Contrast Set

Negation Contrast Set

Contrapositive Equivalence Set

Distributive 1 Equivalence Set

Distributive 2 Equivalence Set

About

Releases

Packages

Languages

License

INK-USC/RobustLR

Folders and files

Latest commit

History

Repository files navigation

RobustLR

RobustLR: A Diagnostic Benchmark for Evaluating Logical Robustness of Deductive Reasoners

Overview of RobustLR

Dependencies

Pipeline for running code

Process data to generate tokenized dataset

Finetune RoBERTa checkpoint

Evaluation on RobustLR diagnostic benchmark

Conjunction Contrast Set

Disjunction Contrast Set

Negation Contrast Set

Contrapositive Equivalence Set

Distributive 1 Equivalence Set

Distributive 2 Equivalence Set

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages