Update README.md
Browse files
README.md
CHANGED
@@ -10,14 +10,30 @@ metrics:
|
|
10 |
library_name: transformers
|
11 |
pipeline_tag: fill-mask
|
12 |
---
|
|
|
13 |
### SRDberta
|
14 |
|
15 |
-
This is a BERT model trained for Masked Language Modeling for
|
16 |
|
17 |
Hinglish is a term used to describe the hybrid language spoken in India, which combines elements of Hindi and English. It is commonly used in informal conversations and in media such as Bollywood films
|
18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
### Inference
|
20 |
-
```
|
21 |
from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
|
22 |
|
23 |
tokenizer = AutoTokenizer.from_pretrained("SRDdev/SRDBerta")
|
@@ -26,7 +42,7 @@ model = AutoModelForMaskedLM.from_pretrained("SRDdev/SRDBerta")
|
|
26 |
|
27 |
fill = pipeline('fill-mask', model='SRDberta', tokenizer='SRDberta')
|
28 |
```
|
29 |
-
```
|
30 |
fill_mask = fill.tokenizer.mask_token
|
31 |
fill(f'Aap {fill_mask} ho?')
|
32 |
```
|
@@ -34,6 +50,10 @@ fill(f'Aap {fill_mask} ho?')
|
|
34 |
### Citation
|
35 |
Author: @[SRDdev](https://huggingface.co/SRDdev)
|
36 |
```
|
|
|
37 |
framework : Pytorch
|
38 |
Year: Jan 2023
|
39 |
-
|
|
|
|
|
|
|
|
10 |
library_name: transformers
|
11 |
pipeline_tag: fill-mask
|
12 |
---
|
13 |
+
|
14 |
### SRDberta
|
15 |
|
16 |
+
This is a BERT model trained for Masked Language Modeling for Hinglish Data.
|
17 |
|
18 |
Hinglish is a term used to describe the hybrid language spoken in India, which combines elements of Hindi and English. It is commonly used in informal conversations and in media such as Bollywood films
|
19 |
|
20 |
+
### Dataset
|
21 |
+
Hinglish-Top [Dataset](https://huggingface.co/datasets/WillHeld/hinglish_top) columns
|
22 |
+
- en_query
|
23 |
+
- cs_query
|
24 |
+
- en_parse
|
25 |
+
- cs_parse
|
26 |
+
- domain
|
27 |
+
|
28 |
+
### Training
|
29 |
+
|Epochs|Train Loss|
|
30 |
+
|:------:|:----------:|
|
31 |
+
|4th | 0.251 |
|
32 |
+
|
33 |
+
The model was trained only for 4 epochs due to the GPU limitations. The model will give far better results with 10 epochs
|
34 |
+
|
35 |
### Inference
|
36 |
+
```python
|
37 |
from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
|
38 |
|
39 |
tokenizer = AutoTokenizer.from_pretrained("SRDdev/SRDBerta")
|
|
|
42 |
|
43 |
fill = pipeline('fill-mask', model='SRDberta', tokenizer='SRDberta')
|
44 |
```
|
45 |
+
```python
|
46 |
fill_mask = fill.tokenizer.mask_token
|
47 |
fill(f'Aap {fill_mask} ho?')
|
48 |
```
|
|
|
50 |
### Citation
|
51 |
Author: @[SRDdev](https://huggingface.co/SRDdev)
|
52 |
```
|
53 |
+
Name : Shreyas Dixit
|
54 |
framework : Pytorch
|
55 |
Year: Jan 2023
|
56 |
+
Pipeline : fill-mask
|
57 |
+
Github : https://github.com/SRDdev
|
58 |
+
LinkedIn : https://www.linkedin.com/in/srddev/
|
59 |
+
```
|