Skip to content

vemchance/BDA-SemEval4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

702a3a4 · Jul 29, 2024
Feb 8, 2024
Feb 25, 2024
Jan 26, 2024
Mar 12, 2024
Mar 12, 2024
Jan 29, 2024
Jan 17, 2024
Jan 3, 2024
Jan 1, 2024
Jul 29, 2024

Repository files navigation

Big Data Analytics SemEval2024 Entry

Code for the Big Data Analytics research group SemEval2024 Task 4 entry.

This is our entry to SemEval2024 Task 4: Multilingual Detection of Persuasion Techniques in Memes.

Paper: https://aclanthology.org/2024.semeval-1.20.pdf

This repository is a work in progress.

System Requirements

  • TODO detail system requirements here

Getting started

The code in this repository is split into multiple subdirectories:

  • EDA: Exploratory Data Analysis. Extra experiments not required to utilise our approach.
  • GoogleVision: Generates entities from image files. This is used as an input to the vision stream.
  • LateFusionEngine: The late-fusion engine that merges the output of the NLP and Vision streams together using an per-label accuracy weighting system.
  • Subtask2a: Contains the training code, F1 hierarchy (Evaluation Code) and models used for Subtask2a.
  • Subtask2b: Contains the training code, models and post-evaluation models from Subtask2b.
  • Test Prediction Files: Test prediction files which can be used to test the late fusion engine.
  • word-embeddings: Some experiments with word embedding algorithms. These experiments informed the rest of the work done, but is not required to use the approach detailed in our paper.

Datasets

Task data can be accessed via the task page after registration. The entities extracted from Google Vision are available via this link. The augmented data as per our paper is generated by direct translating a sample of meme text using GPT-3.5 - to access the original translated data used in our project, contact the lead author: [email protected].

Please visit the README.md file in each subdirectory for specific instructions on each subproject.

A common first step though is to clone this git repository:

git clone https://github.com/vemchance/BDA-SemEval4.git
cd BDA-SemEval4

TODO Finish this section of the README. We should include a high-level overview of the project and how to use it.

Architecture

TODO fill this out.

Pending Changes

  • Subtask2a: A modified set of code which allows easy inclusion, exclusion and stacking of models for all experiments, without manually modifying code.
  • Subtask2b: A modified set of code which allows easy inclusion, exlcusion and stacking of models for all experiments without manual mdoficiation, including ResNet50 and parameter sweep.
  • Integrating Evaluation Code which applies the F1 Hierarchy after or regardless of LateFusion.

Contributing

TODO fill this out. I assume contributions are welcome after the challenge is finished. If so, we should say so here.

Licence

All code in this repository is licensed under the GNU Affero General Public License 3.0. choosealicense.com has a great summary of this license: https://choosealicense.com/licenses/agpl-3.0/

AGPL-3.0 was chosen as it provides strong copyleft protections to ensure transparency on future and derivative works. Open Science for the win!

TODO insert appropriate meme here