Toxic Classifiers developed for the GATE Cloud. Two models are available:
- kaggle: trained on the Kaggle Toxic Comments Challenge dataset.
- olid: trained on the OLIDv1 dataset from OffensEval 2019 (paper)
We fine-tuned a Roberta-base
model using the simpletransformers
toolkit.
python=3.8
pandas
tqdm
pytorch
simpletransformers
conda
conda env create -f environment.yml
pip
pip install -r requirements.txt
(if the above does not work or if you want to use GPUs, you can try to follow the installation steps of simpletransformers
: https://simpletransformers.ai/docs/installation/
- Download models from the latest release of this repository (currently available
kaggle.tar.gz
,olid.tar.gz
) - Decompress file inside
models/en/
(which will createmodels/en/kaggle
ormodels/en/olid
respectively)
python __main__.py -t "This is a test"
(should return 0 = non-toxic)
python __main__.py -t "Bastard!"
(should return 1 = toxic)
t
: textl
: language (currently only supports "en")c
: classifier (currently supports "kaggle" and "olid" -- default="kaggle")g
: gpu (default=False)
The output is composed by the predicted class and the probabilities of each class.
Pre-built Docker images are available for a REST service that accepts text and returns a classification according to the relevant model - see the "packages" section for more details.