Skip to content

effervescent-shot/Trend-Topic-Categorization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trend-Topic-Analysis

This is a semester project from Distributed Information Systems Laboratory (LSIR) @ EPFL in spring 2020.

Data analysis and model training can be performed in an ordinary computer as well as parallel maner if the underlying architecture supports it.

Project Aim

This work strove to figure out what people talk about in Twitter in every day in a high level topic groupping. We collected and analysed a big dataset of tweets along with daily trending topics, clustered them and categorised the trending topics into conventional media categories by using LDA (Latent Dirichlet Allocation). As result, we created clean dataset of tweets matched with their trend topics and their general categorizes. Interestingly, keywords naturally obtained in the process of LDA also can be used in summarization, search or description purposes in the future for a particular category of news\texts.

File Structure

For security purposes Data is not published.

Only the best model is publish under LDA along with model checking scripts

Notebooks are for data preprocessing and cleanup as well as reporting; numbered step by step.

Scripts contains all the code written in Notebooks with proper comments.

Papers folder contains articles read during the project; not each one of them appears as reference in the report

About

This is semester projects from LSIR

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published