Skip to content

Neural networks (CNN, LSTM, MLP) for tweets classifications using fastText word vectors

License

Notifications You must be signed in to change notification settings

Ambros94/IberEval2017

Repository files navigation

IberEval2017

This code is developed to take part into IberEval2017 competition .

In particular Classification Of Spanish Election Tweets (COSET) and STANCE AND GENDER DETECTION IN TWEETS ON CATALAN

Task deadlines: March 20th, 2017 Release of training data.

April 24th, 2017 Release of test data.

May 08th, 2017 Submission of runs.

May 15th, 2017 Evaluation results.

May 29th, 2017 Working notes due.

June 12nd, 2017 Review to authors.

June 26th, 2017 Camera ready due.

Contacts:

Mail: [email protected]

Mailing list: [email protected]

Developers: Ambrosini Luca - [email protected]

Giancarlo Nicolò - [email protected]

Used wordvectors: https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.es.vec

Tweet representations:

  • Bag of words

    • tf-idf normalization
  • Bag of n-grams

    • tf-idf normalization
  • Word embeddings

    • Leaned online
    • fasttext es improved online
    • fasttext es static
    • fasttext ca improved online
    • fasttext ca static
  • N-grams

Classifiers

  • Random forest
  • Support Vector Machines
  • Decision trees

Deep neural models

  • Multi Layer Perceptron
  • CNN
  • LSTM
  • CNN+LSTM
  • BI-LSTM
  • KIM
  • FAST-TEXT

Pre-processing

  • Stemming
  • Remove stop-words
  • Clean url
  • Clean numbers
  • Clean twitter reserved words
  • Tokenize mentions
  • Tokenize Smiley
  • Tokenize Emojies

About

Neural networks (CNN, LSTM, MLP) for tweets classifications using fastText word vectors

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published