XLM-RoBERTa (base) fine-tuned on HC3 for ChatGPT text detection
XLM-RoBERTa (base) fine-tuned on Hello-SimpleAI HC3 corpus for ChatGPT text detection.
All credit to Hello-SimpleAI for their huge work!
F1 score on test dataset: 0.9736
The model
XLM-RoBERTa model pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages. It was introduced in the paper Unsupervised Cross-lingual Representation Learning at Scale by Conneau et al. and first released in this repository.
The dataset
Human ChatGPT Comparison Corpus (HC3)
The first human-ChatGPT comparison corpus, named HC3 dataset by Hello-SimpleAI
This dataset is introduced in the paper:
Metrics
metric | value |
---|---|
F1 | 0.9736 |
Usage
from transformers import pipeline
ckpt = "mrm8488/xlm-roberta-base-finetuned-HC3-mix"
detector = pipeline('text-classification', model=ckpt)
text = "Here your text..."
result = detector(text)
print(result)
Citation
@misc {manuel_romero_2023,
author = { {Manuel Romero} },
title = { xlm-roberta-base-finetuned-HC3-mix (Revision b18de48) },
year = 2023,
url = { https://huggingface.co/mrm8488/xlm-roberta-base-finetuned-HC3-mix },
doi = { 10.57967/hf/0306 },
publisher = { Hugging Face }
}
- Downloads last month
- 15
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.