This repository contains the code/data for IrokoBench and InjongoIntent projects:
-
IrokoBench: a human-translated benchmark dataset for 16 typologically-diverse low-resource African languages covering three tasks: natural language inference~(AfriXNLI), mathematical reasoning~(AfriMGSM), and multi-choice knowledge-based QA~(AfriMMLU). The datasets are also available HuggingFace
-
InjongoIntent: consist of slot-filling and intent detection dataset for 16 African languages covering various domains such as banking (e.g. pay bill), home (e.g. play music), kitchen and dining (e.g. confirm reservation), travel (e.g. plug type), and utility (e.g. make call). We collected 3,200 sentences per language. The Dataset paper with full details will be released later.
The project has been generously funded by Lacuna Fund