citizenlab/distilbert-base-multilingual-cased-toxicity
This is multilingual Distil-Bert model sequence classifier trained based on JIGSAW Toxic Comment Classification Challenge dataset.
How to use it
from transformers import pipeline
model_path = "citizenlab/distilbert-base-multilingual-cased-toxicity"
toxicity_classifier = pipeline("text-classification", model=model_path, tokenizer=model_path)
toxicity_classifier("this is a lovely message")
> [{'label': 'not_toxic', 'score': 0.9954179525375366}]
toxicity_classifier("you are an idiot and you and your family should go back to your country")
> [{'label': 'toxic', 'score': 0.9948776960372925}]
Evaluation
Accuracy
Accuracy Score = 0.9425
F1 Score (Micro) = 0.9450549450549449
F1 Score (Macro) = 0.8491432341169309
- Downloads last month
- 4,857
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.