|
--- |
|
license: cc-by-4.0 |
|
language: |
|
- en |
|
pipeline_tag: text-classification |
|
tags: |
|
- RoBERTa-large |
|
- topic |
|
- news |
|
--- |
|
|
|
# Fine-tuned RoBERTa-large for detecting news on crime |
|
|
|
# Model Description |
|
|
|
This model is a finetuned RoBERTa-large, for classifying whether news articles are about crime. |
|
|
|
# How to Use |
|
|
|
```python |
|
from transformers import pipeline |
|
classifier = pipeline("text-classification", model="dell-research-harvard/topic-crime") |
|
classifier("Man robs bank") |
|
``` |
|
|
|
# Training data |
|
|
|
The model was trained on a hand-labelled sample of data from the [NEWSWIRE dataset](https://huggingface.co/datasets/dell-research-harvard/newswire). |
|
|
|
Split|Size |
|
-|- |
|
Train|463 |
|
Dev|98 |
|
Test|98 |
|
|
|
# Test set results |
|
|
|
Metric|Result |
|
-|- |
|
F1|0.9041 |
|
Accuracy|0.9286 |
|
Precision|0.8919 |
|
Recall|0.9167 |
|
|
|
|
|
# Citation Information |
|
|
|
You can cite this dataset using |
|
|
|
``` |
|
@misc{silcock2024newswirelargescalestructureddatabase, |
|
title={Newswire: A Large-Scale Structured Database of a Century of Historical News}, |
|
author={Emily Silcock and Abhishek Arora and Luca D'Amico-Wong and Melissa Dell}, |
|
year={2024}, |
|
eprint={2406.09490}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2406.09490}, |
|
} |
|
``` |
|
|
|
# Applications |
|
|
|
We applied this model to a century of historical news articles. You can see all the classifications in the [NEWSWIRE dataset](https://huggingface.co/datasets/dell-research-harvard/newswire). |
|
|
|
|
|
|