emilys commited on
Commit
914bea1
1 Parent(s): 853322a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-classification
6
+ tags:
7
+ - distilroberta
8
+ - topic
9
+ - news
10
+ ---
11
+
12
+ # Fine-tuned DistilRoBERTa-base for detecting news on politics
13
+
14
+ # Model Description
15
+
16
+ This model is a finetuned RoBERTa-large, for classifying whether news articles are about crime.
17
+
18
+ # How to Use
19
+
20
+ ```python
21
+ from transformers import pipeline
22
+ classifier = pipeline("text-classification", model="dell-research-harvard/topic-crime")
23
+ classifier("Man robs bank")
24
+ ```
25
+
26
+ # Training data
27
+
28
+ The model was trained on a hand-labelled sample of data from the [NEWSWIRE dataset](https://huggingface.co/datasets/dell-research-harvard/newswire).
29
+
30
+ Split|Size
31
+ -|-
32
+ Train|463
33
+ Dev|98
34
+ Test|98
35
+
36
+ # Test set results
37
+
38
+ Metric|Result
39
+ -|-
40
+ F1|0.9041
41
+ Accuracy|0.9286
42
+ Precision|0.8919
43
+ Recall|0.9167
44
+
45
+
46
+ # Citation Information
47
+
48
+ You can cite this dataset using
49
+
50
+ ```
51
+ @misc{silcock2024newswirelargescalestructureddatabase,
52
+ title={Newswire: A Large-Scale Structured Database of a Century of Historical News},
53
+ author={Emily Silcock and Abhishek Arora and Luca D'Amico-Wong and Melissa Dell},
54
+ year={2024},
55
+ eprint={2406.09490},
56
+ archivePrefix={arXiv},
57
+ primaryClass={cs.CL},
58
+ url={https://arxiv.org/abs/2406.09490},
59
+ }
60
+ ```
61
+
62
+ # Applications
63
+
64
+ We applied this model to a century of historical news articles. You can see all the classifications in the [NEWSWIRE dataset](https://huggingface.co/datasets/dell-research-harvard/newswire).
65
+
66
+