Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# HeBERT: Pre-trained BERT for Polarity Analysis and Emotion Recognition
|
2 |
+
<img align="right" src="https://github.com/avichaychriqui/HeBERT/blob/main/data/heBERT_logo.png?raw=true" width="250">
|
3 |
+
|
4 |
+
HeBERT is a Hebrew pretrained language model. It is based on [Google's BERT](https://arxiv.org/abs/1810.04805) architecture and it is BERT-Base config. <br>
|
5 |
+
|
6 |
+
HeBert was trained on three dataset:
|
7 |
+
1. A Hebrew version of [OSCAR](https://oscar-corpus.com/): ~9.8 GB of data, including 1 billion words and over 20.8 millions sentences.
|
8 |
+
2. A Hebrew dump of [Wikipedia](https://dumps.wikimedia.org/): ~650 MB of data, including over 63 millions words and 3.8 millions sentences
|
9 |
+
3. Emotion User Generated Content (UGC) data that was collected for the purpose of this study (described below).
|
10 |
+
|
11 |
+
|
12 |
+
## Named-entity recognition (NER)
|
13 |
+
The ability of the model to classify named entities in text, such as persons' names, organizations, and locations; tested on a labeled dataset from [Ben Mordecai and M Elhadad (2005)](https://www.cs.bgu.ac.il/~elhadad/nlpproj/naama/), and evaluated with F1-score.
|
14 |
+
|
15 |
+
### How to use
|
16 |
+
```
|
17 |
+
from transformers import pipeline
|
18 |
+
|
19 |
+
# how to use?
|
20 |
+
NER = pipeline(
|
21 |
+
"token-classification",
|
22 |
+
model="avichr/heBERT_NER",
|
23 |
+
tokenizer="avichr/heBERT_NER",
|
24 |
+
)
|
25 |
+
NER('ืืืื ืืืื ืืืื ืืืจืกืืื ืืขืืจืืช ืฉืืืจืืฉืืื')
|
26 |
+
```
|
27 |
+
|
28 |
+
## Other tasks
|
29 |
+
[**Emotion Recognition Model**](https://huggingface.co/avichr/hebEMO_trust).
|
30 |
+
An online model can be found at [huggingface spaces](https://huggingface.co/spaces/avichr/HebEMO_demo) or as [colab notebook](https://colab.research.google.com/drive/1Jw3gOWjwVMcZslu-ttXoNeD17lms1-ff?usp=sharing)
|
31 |
+
<br>
|
32 |
+
[**Sentiment Analysis**](https://huggingface.co/avichr/heBERT_sentiment_analysis).
|
33 |
+
<br>
|
34 |
+
[**masked-LM model**](https://huggingface.co/avichr/heBERT) (can be fine-tunned to any down-stream task).
|
35 |
+
|
36 |
+
## Contact us
|
37 |
+
[Avichay Chriqui](mailto:[email protected]) <br>
|
38 |
+
[Inbal yahav](mailto:[email protected]) <br>
|
39 |
+
The Coller Semitic Languages AI Lab <br>
|
40 |
+
Thank you, ืชืืื, ุดูุฑุง <br>
|
41 |
+
|
42 |
+
## If you used this model please cite us as :
|
43 |
+
Chriqui, A., & Yahav, I. (2021). HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. arXiv preprint arXiv:2102.01909.
|
44 |
+
```
|
45 |
+
@article{chriqui2021hebert,
|
46 |
+
title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
|
47 |
+
author={Chriqui, Avihay and Yahav, Inbal},
|
48 |
+
journal={arXiv preprint arXiv:2102.01909},
|
49 |
+
year={2021}
|
50 |
+
}
|
51 |
+
```
|
52 |
+
[git](https://github.com/avichaychriqui/HeBERT)
|
53 |
+
|