avichr commited on
Commit
fc4a877
โ€ข
1 Parent(s): 7ba5ff3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # HeBERT: Pre-trained BERT for Polarity Analysis and Emotion Recognition
2
+ <img align="right" src="https://github.com/avichaychriqui/HeBERT/blob/main/data/heBERT_logo.png?raw=true" width="250">
3
+
4
+ HeBERT is a Hebrew pretrained language model. It is based on [Google's BERT](https://arxiv.org/abs/1810.04805) architecture and it is BERT-Base config. <br>
5
+
6
+ HeBert was trained on three dataset:
7
+ 1. A Hebrew version of [OSCAR](https://oscar-corpus.com/): ~9.8 GB of data, including 1 billion words and over 20.8 millions sentences.
8
+ 2. A Hebrew dump of [Wikipedia](https://dumps.wikimedia.org/): ~650 MB of data, including over 63 millions words and 3.8 millions sentences
9
+ 3. Emotion User Generated Content (UGC) data that was collected for the purpose of this study (described below).
10
+
11
+
12
+ ## Named-entity recognition (NER)
13
+ The ability of the model to classify named entities in text, such as persons' names, organizations, and locations; tested on a labeled dataset from [Ben Mordecai and M Elhadad (2005)](https://www.cs.bgu.ac.il/~elhadad/nlpproj/naama/), and evaluated with F1-score.
14
+
15
+ ### How to use
16
+ ```
17
+ from transformers import pipeline
18
+
19
+ # how to use?
20
+ NER = pipeline(
21
+ "token-classification",
22
+ model="avichr/heBERT_NER",
23
+ tokenizer="avichr/heBERT_NER",
24
+ )
25
+ NER('ื“ื•ื™ื“ ืœื•ืžื“ ื‘ืื•ื ื™ื‘ืจืกื™ื˜ื” ื”ืขื‘ืจื™ืช ืฉื‘ื™ืจื•ืฉืœื™ื')
26
+ ```
27
+
28
+ ## Other tasks
29
+ [**Emotion Recognition Model**](https://huggingface.co/avichr/hebEMO_trust).
30
+ An online model can be found at [huggingface spaces](https://huggingface.co/spaces/avichr/HebEMO_demo) or as [colab notebook](https://colab.research.google.com/drive/1Jw3gOWjwVMcZslu-ttXoNeD17lms1-ff?usp=sharing)
31
+ <br>
32
+ [**Sentiment Analysis**](https://huggingface.co/avichr/heBERT_sentiment_analysis).
33
+ <br>
34
+ [**masked-LM model**](https://huggingface.co/avichr/heBERT) (can be fine-tunned to any down-stream task).
35
+
36
+ ## Contact us
37
+ [Avichay Chriqui](mailto:[email protected]) <br>
38
+ [Inbal yahav](mailto:[email protected]) <br>
39
+ The Coller Semitic Languages AI Lab <br>
40
+ Thank you, ืชื•ื“ื”, ุดูƒุฑุง <br>
41
+
42
+ ## If you used this model please cite us as :
43
+ Chriqui, A., & Yahav, I. (2021). HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. arXiv preprint arXiv:2102.01909.
44
+ ```
45
+ @article{chriqui2021hebert,
46
+ title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
47
+ author={Chriqui, Avihay and Yahav, Inbal},
48
+ journal={arXiv preprint arXiv:2102.01909},
49
+ year={2021}
50
+ }
51
+ ```
52
+ [git](https://github.com/avichaychriqui/HeBERT)
53
+