diff --git a/data/data_statement.md b/data/data_statement.md new file mode 100644 index 0000000..116ac63 --- /dev/null +++ b/data/data_statement.md @@ -0,0 +1,30 @@ +We are currently relying on 3 datasets for our research and modeling efforts: + +1. Waseem, Zeerak, and Dirk Hovy. "Hateful symbols or hateful people? predictive features for hate speech detection on +twitter." Proceedings of the NAACL student research workshop. 2016. (check +[Hate speech](#Hate-speech) section) + +2. Anzovino, Maria, Elisabetta Fersini, and Paolo Rosso. "Automatic identification and classification of misogynistic +language on twitter." International Conference on Applications of Natural Language to Information Systems. +Springer, Cham, 2018. (check [Automatic Misogyny Identification](#Automatic-Misogyny-Identification ) section) + +3. A dataset that we collected and labeled. Check [Our Annotations](#Our-Annotations) section for a full description +of our process. + + +These 3 datasets are combined into what we call the **gold dataset**. + +The next 3 sections provide an overview of how the data was collected and labeled in the form of data statements +([Bender, Emily M., and Batya Friedman](https://www.aclweb.org/anthology/Q18-1041/)) + +# Hate speech +to-do + +# Automatic Misogyny Identification +to-do + +# Our Annotations +to-do + + +