Upload 9 files
Browse files- .gitattributes +16 -0
- README.md +106 -0
- config.json +45 -0
- merges.txt +0 -0
- special_tokens_map.json +1 -0
- tokenizer.json +0 -0
- tokenizer_config.json +1 -0
- training_args.bin +3 -0
- vocab.json +0 -0
.gitattributes
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
*.bin.* filter=lfs diff=lfs merge=lfs -text
|
2 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
4 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
5 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
6 |
+
*.tar.gz filter=lfs diff=lfs merge=lfs -text
|
7 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
8 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
9 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
10 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
11 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
14 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
15 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
16 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
@@ -0,0 +1,106 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: "en"
|
3 |
+
tags:
|
4 |
+
- distilroberta
|
5 |
+
- sentiment
|
6 |
+
- emotion
|
7 |
+
- twitter
|
8 |
+
- reddit
|
9 |
+
|
10 |
+
widget:
|
11 |
+
- text: "Oh wow. I didn't know that."
|
12 |
+
- text: "This movie always makes me cry.."
|
13 |
+
- text: "Oh Happy Day"
|
14 |
+
|
15 |
+
---
|
16 |
+
|
17 |
+
# Emotion English DistilRoBERTa-base
|
18 |
+
|
19 |
+
# Description βΉ
|
20 |
+
|
21 |
+
With this model, you can classify emotions in English text data. The model was trained on 6 diverse datasets (see Appendix below) and predicts Ekman's 6 basic emotions, plus a neutral class:
|
22 |
+
|
23 |
+
1) anger π€¬
|
24 |
+
2) disgust π€’
|
25 |
+
3) fear π¨
|
26 |
+
4) joy π
|
27 |
+
5) neutral π
|
28 |
+
6) sadness π
|
29 |
+
7) surprise π²
|
30 |
+
|
31 |
+
The model is a fine-tuned checkpoint of [DistilRoBERTa-base](https://huggingface.co/distilroberta-base). For a 'non-distilled' emotion model, please refer to the model card of the [RoBERTa-large](https://huggingface.co/j-hartmann/emotion-english-roberta-large) version.
|
32 |
+
|
33 |
+
# Application π
|
34 |
+
|
35 |
+
a) Run emotion model with 3 lines of code on single text example using Hugging Face's pipeline command on Google Colab:
|
36 |
+
|
37 |
+
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/j-hartmann/emotion-english-distilroberta-base/blob/main/simple_emotion_pipeline.ipynb)
|
38 |
+
|
39 |
+
```python
|
40 |
+
from transformers import pipeline
|
41 |
+
classifier = pipeline("text-classification", model="j-hartmann/emotion-english-distilroberta-base", return_all_scores=True)
|
42 |
+
classifier("I love this!")
|
43 |
+
```
|
44 |
+
|
45 |
+
```python
|
46 |
+
Output:
|
47 |
+
[[{'label': 'anger', 'score': 0.004419783595949411},
|
48 |
+
{'label': 'disgust', 'score': 0.0016119900392368436},
|
49 |
+
{'label': 'fear', 'score': 0.0004138521908316761},
|
50 |
+
{'label': 'joy', 'score': 0.9771687984466553},
|
51 |
+
{'label': 'neutral', 'score': 0.005764586851000786},
|
52 |
+
{'label': 'sadness', 'score': 0.002092392183840275},
|
53 |
+
{'label': 'surprise', 'score': 0.008528684265911579}]]
|
54 |
+
```
|
55 |
+
|
56 |
+
b) Run emotion model on multiple examples and full datasets (e.g., .csv files) on Google Colab:
|
57 |
+
|
58 |
+
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/j-hartmann/emotion-english-distilroberta-base/blob/main/emotion_prediction_example.ipynb)
|
59 |
+
|
60 |
+
# Contact π»
|
61 |
+
|
62 |
+
Please reach out to [[email protected]](mailto:[email protected]) if you have any questions or feedback.
|
63 |
+
|
64 |
+
Thanks to Samuel Domdey and [chrsiebert](https://huggingface.co/siebert) for their support in making this model available.
|
65 |
+
|
66 |
+
# Reference β
|
67 |
+
|
68 |
+
For attribution, please cite the following reference if you use this model. A working paper will be available soon.
|
69 |
+
|
70 |
+
```
|
71 |
+
Jochen Hartmann, "Emotion English DistilRoBERTa-base". https://huggingface.co/j-hartmann/emotion-english-distilroberta-base/, 2022.
|
72 |
+
```
|
73 |
+
|
74 |
+
BibTex citation:
|
75 |
+
|
76 |
+
```
|
77 |
+
@misc{hartmann2022emotionenglish,
|
78 |
+
author={Hartmann, Jochen},
|
79 |
+
title={Emotion English DistilRoBERTa-base},
|
80 |
+
year={2022},
|
81 |
+
howpublished = {\url{https://huggingface.co/j-hartmann/emotion-english-distilroberta-base/}},
|
82 |
+
}
|
83 |
+
```
|
84 |
+
|
85 |
+
# Appendix π
|
86 |
+
|
87 |
+
Please find an overview of the datasets used for training below. All datasets contain English text. The table summarizes which emotions are available in each of the datasets. The datasets represent a diverse collection of text types. Specifically, they contain emotion labels for texts from Twitter, Reddit, student self-reports, and utterances from TV dialogues. As MELD (Multimodal EmotionLines Dataset) extends the popular EmotionLines dataset, EmotionLines itself is not included here.
|
88 |
+
|
89 |
+
|Name|anger|disgust|fear|joy|neutral|sadness|surprise|
|
90 |
+
|---|---|---|---|---|---|---|---|
|
91 |
+
|Crowdflower (2016)|Yes|-|-|Yes|Yes|Yes|Yes|
|
92 |
+
|Emotion Dataset, Elvis et al. (2018)|Yes|-|Yes|Yes|-|Yes|Yes|
|
93 |
+
|GoEmotions, Demszky et al. (2020)|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
|
94 |
+
|ISEAR, Vikash (2018)|Yes|Yes|Yes|Yes|-|Yes|-|
|
95 |
+
|MELD, Poria et al. (2019)|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
|
96 |
+
|SemEval-2018, EI-reg, Mohammad et al. (2018) |Yes|-|Yes|Yes|-|Yes|-|
|
97 |
+
|
98 |
+
The model is trained on a balanced subset from the datasets listed above (2,811 observations per emotion, i.e., nearly 20k observations in total). 80% of this balanced subset is used for training and 20% for evaluation. The evaluation accuracy is 66% (vs. the random-chance baseline of 1/7 = 14%).
|
99 |
+
|
100 |
+
# Scientific Applications π
|
101 |
+
|
102 |
+
Below you can find a list of papers using "Emotion English DistilRoBERTa-base". If you would like your paper to be added to the list, please send me an email.
|
103 |
+
|
104 |
+
- Butt, S., Sharma, S., Sharma, R., Sidorov, G., & Gelbukh, A. (2022). What goes on inside rumour and non-rumour tweets and their reactions: A Psycholinguistic Analyses. Computers in Human Behavior, 107345.
|
105 |
+
- Kuang, Z., Zong, S., Zhang, J., Chen, J., & Liu, H. (2022). Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings. arXiv preprint arXiv:2210.00434.
|
106 |
+
- Rozado, D., Hughes, R., & Halberstadt, J. (2022). Longitudinal analysis of sentiment and emotion in news media headlines using automated labelling with Transformer language models. Plos one, 17(10), e0276367.
|
config.json
ADDED
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "distilroberta-base",
|
3 |
+
"architectures": [
|
4 |
+
"RobertaForSequenceClassification"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.1,
|
7 |
+
"bos_token_id": 0,
|
8 |
+
"eos_token_id": 2,
|
9 |
+
"gradient_checkpointing": false,
|
10 |
+
"hidden_act": "gelu",
|
11 |
+
"hidden_dropout_prob": 0.1,
|
12 |
+
"hidden_size": 768,
|
13 |
+
"id2label": {
|
14 |
+
"0": "anger",
|
15 |
+
"1": "disgust",
|
16 |
+
"2": "fear",
|
17 |
+
"3": "joy",
|
18 |
+
"4": "neutral",
|
19 |
+
"5": "sadness",
|
20 |
+
"6": "surprise"
|
21 |
+
},
|
22 |
+
"initializer_range": 0.02,
|
23 |
+
"intermediate_size": 3072,
|
24 |
+
"label2id": {
|
25 |
+
"anger": 0,
|
26 |
+
"disgust": 1,
|
27 |
+
"fear": 2,
|
28 |
+
"joy": 3,
|
29 |
+
"neutral": 4,
|
30 |
+
"sadness": 5,
|
31 |
+
"surprise": 6
|
32 |
+
},
|
33 |
+
"layer_norm_eps": 1e-05,
|
34 |
+
"max_position_embeddings": 514,
|
35 |
+
"model_type": "roberta",
|
36 |
+
"num_attention_heads": 12,
|
37 |
+
"num_hidden_layers": 6,
|
38 |
+
"pad_token_id": 1,
|
39 |
+
"position_embedding_type": "absolute",
|
40 |
+
"problem_type": "single_label_classification",
|
41 |
+
"transformers_version": "4.6.1",
|
42 |
+
"type_vocab_size": 1,
|
43 |
+
"use_cache": true,
|
44 |
+
"vocab_size": 50265
|
45 |
+
}
|
merges.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
special_tokens_map.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"bos_token": "<s>", "eos_token": "</s>", "unk_token": "<unk>", "sep_token": "</s>", "pad_token": "<pad>", "cls_token": "<s>", "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": false}}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"unk_token": "<unk>", "bos_token": "<s>", "eos_token": "</s>", "add_prefix_space": false, "errors": "replace", "sep_token": "</s>", "cls_token": "<s>", "pad_token": "<pad>", "mask_token": "<mask>", "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "distilroberta-base"}
|
training_args.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4ed7a68d54395ab0be21726d6fcf25f942ed459b16387bbf9cf251051986766f
|
3 |
+
size 2415
|
vocab.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|