JHU-CUT

Code, data, and models from "Civil Unrest on Twitter (CUT): A Dataset of Tweets to Support Research on Civil Unrest" EMNLP 2020 W-NUT

Dataset

The data is in /data. As per Twitter guidelines it only contains the tweet IDs and not the full tweet content.

keywords_english.txt: Civil unrest-related keywords
known_annotations.csv: "Cround truth" annotations by the authors used to evaluate Mechanical Turk worker annotations
labelled_tweets_is_general_unrest.csv: Labels for tweets (IDs only) and whether they were annotated as "general unrest" and "specific/nonspecific event"
labelled_tweets_is_protest_event.csv: Labels for tweets (IDs only) and whether they were annotated as "specific/nonspecific event"
majority_annotation_results.csv: All labels for the tweets (IDs along with year and country)

Civil Unrest Event Prediction Models

We evaluated ngram and embedding-based models on how well they can identify tweets discussing specific/nonspecific protests and riots (/data/labelled_tweets_is_protest_event.csv). See the above paper for details.

The below trained models are in /results.

Ngram Models

The Keyword model and Unigram model had F1 0.782 and 0.775 F1, respectively.

Code: ngram_model.py
Run settings: run_ngram_models.sh

Note: these scripts handle both the general ngram and civil unrest-related keyword count models.

BERTweet model

This model was not included in the final paper and is still being improved. Currently achieves an F1 of 0.814.

Code: bertweet_model.py
Run settings: run_bertweet_model.sh

Note: Using a GPU for BERTweet is highly recommended

Please email Alexandra DeLucia if you have any issues or questions ([email protected]).

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data		data
results		results
.gitignore		.gitignore
BERTweet_utils.py		BERTweet_utils.py
README.md		README.md
bertweet_model.py		bertweet_model.py
ngram_model.py		ngram_model.py
run_bertweet_gridsearch.sh		run_bertweet_gridsearch.sh
run_bertweet_model.sh		run_bertweet_model.sh
run_ngram_models.sh		run_ngram_models.sh
tweet_annotation_mturk_form.html		tweet_annotation_mturk_form.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JHU-CUT

Dataset

Civil Unrest Event Prediction Models

About

Releases

Packages

Contributors 2

Languages

JHU-CLSP/JHU-CUT

Folders and files

Latest commit

History

Repository files navigation

JHU-CUT

Dataset

Civil Unrest Event Prediction Models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages