This project seeks to predict surprising event boundaries in stories.
pip install -r requirements.txt
Python3 is required
GPU (>= 16GB memory) is highly recommended.
sh extract_all_features.sh
To run this source code, several external resources are required to be downloaded separately. To facilitate understanding of the features without having to obtain each of them, we have extracted all features under data/
External resources required:
- Event annotated data (See Sap et al., 2021)
- Story Cloze Test dataset (See Mostafazadeh et al., 2016)
- Atomic relation dataset (See Sap et al., 2019)
- Glucose relation dataset (See Mostafazadeh et al., 2020)
- Realis-annotated dataset (See Sims et al., 2019)
- URL and Access Key for using Turing-NLG model (See Rosset, 2020)
To train on detecting event boundaries:
sh train.sh event_boundaries
To train on identifying commonsense and nonsense story endings
sh train.sh story_cloze
To predict event boundaries on the story cloze dataset
python surprise_ranker_model.py --config_name surprise_ranker_lr_5e-6_gru_new_prior_confidence_limit_30_1 --lr 5e-6 --train_dataset_filename data/all_features_including_annotations_prev_sent_with_prior_confidence.csv --gru True --gru_length_limit 30 --feature_mode prior_confidence --fold_number 1 --inference_only True --load_trained_model True --infer_story_cloze True --story_cloze_train_dataset_filename data/all_features_including_annotations_story_cloze_with_prior_confidence.csv
Please run after model training. ROOT_DIR
refers to the root folder that contains all of trained model config folders.
To plot feature weights that support understanding of informative features:
python interpret_features.py --root_dir ROOT_DIR
To conduct significance testing using McNemar's test
python mcnemar_test.py --root_dir ROOT_DIR
To correlate story ending with predicted event boundaries:
python correlate_story_cloze_to_predicted_surprise.py --filename ROOT_DIR/surprise_ranker_lr_5e-6_gru_new_prior_confidence_limit_30_1/surprise_discriminator_eval_tokens_epoch_.csv