Skip to content

xuanxuanx-98/ToxBias-ParallelCorpora

Repository files navigation

Dialectic Bias in Toxicity Detection

Project work for AP requirements for the module MS 1 - Specialization area according to Modulhandbuch MA Linguistics PO 2018 at the Heinrich-Heine-Universität Düsseldorf.

About: This project aims to examine the performance and dialectic biases of Google Perspetive API on toxicity detection on a Standard American English (SAE) corpus and 4 dialect corpora generated by converting SAE sentences.

Repository Structure

  • /data: the data used for this project, including the original and the synthetic datasets
  • /notebooks: intermediate workspaces for the project
  • /outputs: plots and analysis results output by Python scripts
  • /scores: toxicty scores returned by Perspective API, in batches
  • /scripts: implementation of the project
  • /text_compose: project report and source LaTeX codes

How2Run

  • Install runtime requirements in REQUIREMENTS.txt.
  • Make sure data and scores are available in /data and /scores folders.
  • Run analysis scripts in /scripts.

License

All source code is made available under the GPL-3.0 license. You can freely use and modify the code without warranty if you provide attribution to the authors. See LICENSE for the full license text.

All contents under /text_compose are not open source. The author reserves the rights to the content.

About

Project work for the Master module MS 1's AP requirements.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published