CADD: A Large-scale Comprehensive Abusiveness Detection Dataset with Multifaceted Labels from Reddit
=======================================================
- Note
- 2021/09/05 New version update
- The data is in the CSV format (encoding='latin_1').
- Please MAKE SURE that you are fully aware of and agree to the ethical guidelines.
- Please DO NOT modify this file directly.
- Make no attempt to contact any user in the dataset
- Make no attempt to deanonymize or learn the identity of any user in the dataset
- Make no attempt to link users in the dataset with any external information (e.g., an account on another website)
- Will not share any portion of the data, including example posts or excerpts from posts, with any other party
- Dataset overview
Values Description Title str: Context Contextual information (Title + Body) Body str: Context Contextual information (Title + Body) Comment str: Text A target sentence to be classified. L1: Type {0,1,2,3} 0: Non-abusive, 1: Hate speech, 2: Derogatory, 3: Profanity L2: Abusiveness {0,1} 0: Non-abusive, 1: Abusive L3: Target {0,1} 0: Non-targeted, 1: Targeted L4: Demographic Characteristics {0,1,2,3,4,5,6,7,8} 0: None, 1:Gender, 2: Sexual orientation, 3: Race, 4: Religion 5: Disability, 6: Age, 7: Others, 8:Unclear L5: Implicitness {0,1} 0: None, 1: Implicit (Containing implicit attacks.) L6: Profanity {0,1} 0: None, 1: Profanity (Containing any words expressing abusiveness.)
- Data statistics
Type Train Validation Test Total Hate speech 2,515 388 772 3,675 Derogatory 1,632 241 494 2,367 Profanity 4,595 631 1,339 6,565 Non-abusive 8,412 1,190 2,297 11,899 All 17,154 2,450 4,902 24,506
https://aclanthology.org/2021.conll-1.43.pdf
@inproceedings{song2021large,
title={{A Large-scale Comprehensive Abusiveness Detection Dataset with Multifaceted Labels from Reddit}},
author={Song, Hoyun and Ryu, Soo Hyun and Lee, Huije and Park, Jong C},
booktitle={Proceedings of the 25th Conference on Computational Natural Language Learning},
pages={552--561},
year={2021}
}
These resources are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.