Skip to content

Latest commit

 

History

History
10 lines (7 loc) · 388 Bytes

README.md

File metadata and controls

10 lines (7 loc) · 388 Bytes

kurdish-twitter-data

Kurdish twitter data repository for Kurmanji and Sorani dialects

This dataset includes a total of 29011 Kurmanji and 29010 Sorani tweets.

  • Each line includes content for a new tweet
  • No repeated content, each text entry is unique
  • User-id mentions and URLS are replaced by USER_ID and URL respectively
  • Any new lines characters are removed; hence first rule