Skip to content

miguel-kjh/Analysis-of-tweets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analysis-of-tweets

In this repository there are different models that analyze the opinion left by travelers on twitter. The data has been taken from a competition in Kaggle carried out by a Spanish areoline. The data has been processed and various techniques have been tried for its processing.

Machine Learning

  • Bag of words(TF-idf)
  • Random Forest
  • GuassianNB
  • XGBoost

Deep Learning

  • Word Embedding(Glove)
  • CNN with Kernel = 1: this is a video where explain this technique.
  • Fast-Text: a simple and efficient model for text classification.
  • BETO: the model bert trained for spanish.
  • GRUs: Gated recurrent units.

Results

All models have undergone a fine tuning process to get the best performance from them.

balanced Data

Figure 1: Results of the experiment for a balanced Dataset

Unbalanced Data

Figure 2: Results of the experiment for a unbalanced Dataset

Conclusion

As can be seen in the figures, the connectionist approach (deep learning) generates better results for both datasets, however, using balanced data, the models manage to reach 80% accuracy.CNN and Fast-text are fast and effective methods, however, despite being more powerful transofmers fall below the previous two methods. I suppose that the reason is because being a model designed for large volumes of data, with few data as is the case, only 7000 samples, these models give good results but it is not as impressive as in other applications.

Technologies and Libraries

About

Sentiment analysis of airline tweets

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published