Skip to content

alexantoniogonzalez2/predicting-twitter-fake-connections

Repository files navigation

Who are my friends?

A machine learning model to predict fake connections in Twitter data.

Requirements

  • The complete version of this project is available in this GitHub repository.
  • Environment for run a Jupyter Notebook. For example: Jupyter Project. A basic requirement for Jupyter Notebook is Python.

Compatibility

The libraries needed in this project are specified in the Jupyter Notebooks. The most general libraries utilized are:

  • Python: 3.7.5
  • Tensorflow: 2.0.0
  • Keras: 2.2.4-tf

Structure

  • 'data' folder: Contains the raw data available for this project. It is not added the file 'train.txt' which is available in this Kaggle competition.
  • 'data_processing': Contains the main files to preprocess the raw data and generate files with a proper structure.
  • 'data_generated': Contains the files generated in data preprocessing.
  • 'data_models': Contains the dataset ready to be used by the machine learning models.
  • 'predictions': Contains the prediction made by the different models.
  • A research report is available here.

Some key files

  • data_models/dataset7.cvs contains the most updated dataset.
  • models/neural_netwoks has one of the most relevant models obtained.

Further Analysis

A full report detailing our findings can be found here.

Context

This works is part of the subject COMP90051 Statistical Machine Learning, 2020 Semester 2, The University of Melbourne. Our group was formed by Alex González, Yuqing Xiao and Yee Hean Chuah. Our name in the Kaggle competition was 50 cents.

About

A machine learning model to predict fake connections in Twitter data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •