Skip to content

Commit

Permalink
minor
Browse files Browse the repository at this point in the history
  • Loading branch information
pomonam committed Jun 26, 2024
1 parent 674869d commit 05ce8eb
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 5 deletions.
9 changes: 5 additions & 4 deletions examples/glue/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Can we remove top positively influential training examples to make some queries
selects correctly classified query data point, removes top-k positively influential training samples, and retrain the network with the modified dataset to see if that query
data point gets misclassified.

We first need to compute pairwise influence scores for the `RTE` dataset:
We first need to compute pairwise influence scores for the `RTE` dataset (A6000 GPU was used to run these experiments):

```bash
python train.py --dataset_name rte \
Expand All @@ -76,13 +76,13 @@ python train.py --dataset_name rte \
--seed 1004

python analyze.py --dataset_name rte \
--query_batch_size 175 \
--query_batch_size 70 \
--train_batch_size 128 \
--checkpoint_dir ./checkpoints \
--factor_strategy ekfac

python analyze.py --dataset_name rte \
--query_batch_size 175 \
--query_batch_size 139 \
--train_batch_size 128 \
--checkpoint_dir ./checkpoints \
--factor_strategy identity
Expand All @@ -97,7 +97,8 @@ python analyze.py --dataset_name rte \
## Evaluating Linear Datamodeling Score

The `evaluate_lds.py` script computes the [linear datamodeling score (LDS)](https://arxiv.org/abs/2303.14186). It measures the LDS obtained by
retraining the network 500 times with different subsets of the dataset (5 repeats and 100 masks). By running `evaludate_lds.py`, we obtain `xx` LDS (we get `xx` LDS with the half precision).
retraining the network 500 times with different subsets of the dataset (5 repeats and 100 masks).
By running `evaludate_lds.py`, we obtain `xx` LDS (we get `xx` LDS with the half precision).

The script also includes functionality to print out top influential sequences for a given query.

Expand Down
2 changes: 1 addition & 1 deletion examples/glue/analyze.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ def main():
dataset=train_dataset,
per_device_batch_size=None,
factor_args=factor_args,
overwrite_output_dir=True,
overwrite_output_dir=False,
initial_per_device_batch_size_attempt=512,
)

Expand Down

0 comments on commit 05ce8eb

Please sign in to comment.