File size: 1,447 Bytes
f9c22f9 20d4698 f9c22f9 20d4698 3efc245 20d4698 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
---
license: mit
language:
- bn
metrics:
- wer
- cer
tags:
- seq2seq
- ipa
- bengali
- byt5
---
# Regional bengali text to IPA transcription - byT5-small
This is a fine-tuned version of the [umt5-base](https://huggingface.co/google/umt5-base) for the task of generating IPA transcriptions from regional bengali text.
This was done on the dataset of the competition [“ভাষামূল: মুখের ভাষার খোঁজে“](https://www.kaggle.com/competitions/regipa/overview) by Bengali.AI.
Best scores achieved in the leaderboards:
- **Public score**: 0.01995
- **Private score**: 0.02072
## Loading & using the model
```python
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("smji/ben2ipa-byt5small")
model = AutoModelForSeq2SeqLM.from_pretrained("smji/ben2ipa-byt5small")
"""
The format of the input text must be: <district> <bengali_text>
"""
text = "<Chittagong> bengali_text_here"
text_ids = tokenizer(text, return_tensors='pt').input_ids
model(text_ids)
```
## Using the pipeline
```python
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text2text-generation", model="smji/ben2ipa-byt5small", device=device)
```
## Credits
Done by [S M Jishanul Islam](https://github.com/S-M-J-I), [Sadia Ahmmed](https://github.com/sadia-ahmmed), [Sahid Hossain Mustakim](https://github.com/sratul35) |