SRDdev commited on
Commit
c113a01
1 Parent(s): 64923ff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -4
README.md CHANGED
@@ -10,14 +10,30 @@ metrics:
10
  library_name: transformers
11
  pipeline_tag: fill-mask
12
  ---
 
13
  ### SRDberta
14
 
15
- This is a BERT model trained for Masked Language Modeling for Higlish Data.
16
 
17
  Hinglish is a term used to describe the hybrid language spoken in India, which combines elements of Hindi and English. It is commonly used in informal conversations and in media such as Bollywood films
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ### Inference
20
- ```
21
  from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
22
 
23
  tokenizer = AutoTokenizer.from_pretrained("SRDdev/SRDBerta")
@@ -26,7 +42,7 @@ model = AutoModelForMaskedLM.from_pretrained("SRDdev/SRDBerta")
26
 
27
  fill = pipeline('fill-mask', model='SRDberta', tokenizer='SRDberta')
28
  ```
29
- ```
30
  fill_mask = fill.tokenizer.mask_token
31
  fill(f'Aap {fill_mask} ho?')
32
  ```
@@ -34,6 +50,10 @@ fill(f'Aap {fill_mask} ho?')
34
  ### Citation
35
  Author: @[SRDdev](https://huggingface.co/SRDdev)
36
  ```
 
37
  framework : Pytorch
38
  Year: Jan 2023
39
- ```
 
 
 
 
10
  library_name: transformers
11
  pipeline_tag: fill-mask
12
  ---
13
+
14
  ### SRDberta
15
 
16
+ This is a BERT model trained for Masked Language Modeling for Hinglish Data.
17
 
18
  Hinglish is a term used to describe the hybrid language spoken in India, which combines elements of Hindi and English. It is commonly used in informal conversations and in media such as Bollywood films
19
 
20
+ ### Dataset
21
+ Hinglish-Top [Dataset](https://huggingface.co/datasets/WillHeld/hinglish_top) columns
22
+ - en_query
23
+ - cs_query
24
+ - en_parse
25
+ - cs_parse
26
+ - domain
27
+
28
+ ### Training
29
+ |Epochs|Train Loss|
30
+ |:------:|:----------:|
31
+ |4th | 0.251 |
32
+
33
+ The model was trained only for 4 epochs due to the GPU limitations. The model will give far better results with 10 epochs
34
+
35
  ### Inference
36
+ ```python
37
  from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
38
 
39
  tokenizer = AutoTokenizer.from_pretrained("SRDdev/SRDBerta")
 
42
 
43
  fill = pipeline('fill-mask', model='SRDberta', tokenizer='SRDberta')
44
  ```
45
+ ```python
46
  fill_mask = fill.tokenizer.mask_token
47
  fill(f'Aap {fill_mask} ho?')
48
  ```
 
50
  ### Citation
51
  Author: @[SRDdev](https://huggingface.co/SRDdev)
52
  ```
53
+ Name : Shreyas Dixit
54
  framework : Pytorch
55
  Year: Jan 2023
56
+ Pipeline : fill-mask
57
+ Github : https://github.com/SRDdev
58
+ LinkedIn : https://www.linkedin.com/in/srddev/
59
+ ```