This code is developed to take part into IberEval2017 competition .
In particular Classification Of Spanish Election Tweets (COSET) and STANCE AND GENDER DETECTION IN TWEETS ON CATALAN
Task deadlines: March 20th, 2017 Release of training data.
April 24th, 2017 Release of test data.
May 08th, 2017 Submission of runs.
May 15th, 2017 Evaluation results.
May 29th, 2017 Working notes due.
June 12nd, 2017 Review to authors.
June 26th, 2017 Camera ready due.
Contacts:
Mail: [email protected]
Mailing list: [email protected]
Developers: Ambrosini Luca - [email protected]
Giancarlo Nicolò - [email protected]
Used wordvectors: https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.es.vec
-
Bag of words
- tf-idf normalization
-
Bag of n-grams
- tf-idf normalization
-
Word embeddings
- Leaned online
- fasttext es improved online
- fasttext es static
- fasttext ca improved online
- fasttext ca static
-
N-grams
- Random forest
- Support Vector Machines
- Decision trees
- Multi Layer Perceptron
- CNN
- LSTM
- CNN+LSTM
- BI-LSTM
- KIM
- FAST-TEXT
- Stemming
- Remove stop-words
- Clean url
- Clean numbers
- Clean twitter reserved words
- Tokenize mentions
- Tokenize Smiley
- Tokenize Emojies