Big Data Analytics SemEval2024 Entry

Code for the Big Data Analytics research group SemEval2024 Task 4 entry.

This is our entry to SemEval2024 Task 4: Multilingual Detection of Persuasion Techniques in Memes.

Paper: https://aclanthology.org/2024.semeval-1.20.pdf

This repository is a work in progress.

System Requirements

TODO detail system requirements here

Getting started

The code in this repository is split into multiple subdirectories:

EDA: Exploratory Data Analysis. Extra experiments not required to utilise our approach.
GoogleVision: Generates entities from image files. This is used as an input to the vision stream.
LateFusionEngine: The late-fusion engine that merges the output of the NLP and Vision streams together using an per-label accuracy weighting system.
Subtask2a: Contains the training code, F1 hierarchy (Evaluation Code) and models used for Subtask2a.
Subtask2b: Contains the training code, models and post-evaluation models from Subtask2b.
Test Prediction Files: Test prediction files which can be used to test the late fusion engine.
word-embeddings: Some experiments with word embedding algorithms. These experiments informed the rest of the work done, but is not required to use the approach detailed in our paper.

Datasets

Task data can be accessed via the task page after registration. The entities extracted from Google Vision are available via this link. The augmented data as per our paper is generated by direct translating a sample of meme text using GPT-3.5 - to access the original translated data used in our project, contact the lead author: [email protected].

Please visit the README.md file in each subdirectory for specific instructions on each subproject.

A common first step though is to clone this git repository:

git clone https://github.com/vemchance/BDA-SemEval4.git
cd BDA-SemEval4

TODO Finish this section of the README. We should include a high-level overview of the project and how to use it.

Architecture

TODO fill this out.

Pending Changes

Subtask2a: A modified set of code which allows easy inclusion, exclusion and stacking of models for all experiments, without manually modifying code.
Subtask2b: A modified set of code which allows easy inclusion, exlcusion and stacking of models for all experiments without manual mdoficiation, including ResNet50 and parameter sweep.
Integrating Evaluation Code which applies the F1 Hierarchy after or regardless of LateFusion.

Contributing

TODO fill this out. I assume contributions are welcome after the challenge is finished. If so, we should say so here.

Licence

All code in this repository is licensed under the GNU Affero General Public License 3.0. choosealicense.com has a great summary of this license: https://choosealicense.com/licenses/agpl-3.0/

AGPL-3.0 was chosen as it provides strong copyleft protections to ensure transparency on future and derivative works. Open Science for the win!

TODO insert appropriate meme here

Name	Name	Last commit message	Last commit date
Latest commit vemchance Update README.md Jul 29, 2024 702a3a4 · Jul 29, 2024 History 136 Commits
EDA	EDA	Fill README out a bit more	Feb 8, 2024
GoogleVision	GoogleVision	Update readme.md	Feb 25, 2024
LateFusionEngine	LateFusionEngine	LateFusionEngine: Implement copy of subtask2a fusion engine for subta…	Jan 26, 2024
Subtask2a	Subtask2a	Update readme.md	Mar 12, 2024
Subtask2b	Subtask2b	Update Post-Evaluation Subtask2b Model.py	Mar 12, 2024
Test Prediction Files	Test Prediction Files	Add files via upload	Jan 29, 2024
word-embeddings	word-embeddings	umapify: add check to ensure input always has the same number of fiel…	Jan 17, 2024
.gitignore	.gitignore	.gitignore: ignore glove	Jan 3, 2024
LICENSE.md	LICENSE.md	Add AGPL-3.0 as license.	Jan 1, 2024
README.md	README.md	Update README.md	Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Analytics SemEval2024 Entry

System Requirements

Getting started

Datasets

Architecture

Pending Changes

Contributing

Licence

About

Releases

Packages

Contributors 3

Languages

License

vemchance/BDA-SemEval4

Folders and files

Latest commit

History

Repository files navigation

Big Data Analytics SemEval2024 Entry

System Requirements

Getting started

Datasets

Architecture

Pending Changes

Contributing

Licence

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages