A machine learning model to predict fake connections in Twitter data.
- The complete version of this project is available in this GitHub repository.
- Environment for run a Jupyter Notebook. For example: Jupyter Project. A basic requirement for Jupyter Notebook is Python.
The libraries needed in this project are specified in the Jupyter Notebooks. The most general libraries utilized are:
- Python: 3.7.5
- Tensorflow: 2.0.0
- Keras: 2.2.4-tf
- 'data' folder: Contains the raw data available for this project. It is not added the file 'train.txt' which is available in this Kaggle competition.
- 'data_processing': Contains the main files to preprocess the raw data and generate files with a proper structure.
- 'data_generated': Contains the files generated in data preprocessing.
- 'data_models': Contains the dataset ready to be used by the machine learning models.
- 'predictions': Contains the prediction made by the different models.
- A research report is available here.
- data_models/dataset7.cvs contains the most updated dataset.
- models/neural_netwoks has one of the most relevant models obtained.
A full report detailing our findings can be found here.
This works is part of the subject COMP90051 Statistical Machine Learning, 2020 Semester 2, The University of Melbourne. Our group was formed by Alex González, Yuqing Xiao and Yee Hean Chuah. Our name in the Kaggle competition was 50 cents.