README.md · SRDdev/MaskedLM at da9664655e998770a116fff7ba2559250963a5b5

metadata

license: afl-3.0
datasets:
  - WillHeld/hinglish_top
language:
  - en
  - hi
metrics:
  - accuracy
library_name: transformers
pipeline_tag: fill-mask

SRDberta

This is a BERT model trained for Masked Language Modeling for Hinglish Data.

Hinglish is a term used to describe the hybrid language spoken in India, which combines elements of Hindi and English. It is commonly used in informal conversations and in media such as Bollywood films

Dataset

Hinglish-Top Dataset columns

en_query
cs_query
en_parse
cs_parse
domain

Training

Epochs	Train Loss
4th	0.251

The model was trained only for 4 epochs due to the GPU limitations. The model will give far better results with 10 epochs

Inference

from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("SRDdev/SRDBerta")

model = AutoModelForMaskedLM.from_pretrained("SRDdev/SRDBerta")

fill = pipeline('fill-mask', model='SRDberta', tokenizer='SRDberta')

fill_mask = fill.tokenizer.mask_token
fill(f'Aap {fill_mask} ho?')

Citation

Author: @SRDdev

Name : Shreyas Dixit
framework : Pytorch
Year: Jan 2023
Pipeline : fill-mask
Github : https://github.com/SRDdev
LinkedIn : https://www.linkedin.com/in/srddev/