File size: 994 Bytes
4100f75 e1b5035 f4b97ac c99d8ef f4b97ac e77e620 cfcb398 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
A monolingual T5 model for Persian trained on OSCAR 21.09 (https://oscar-corpus.com/) corpus with self-supervised method. 35 Gig deduplicated version of Persian data was used for pre-training the model.
It's similar to the English T5 model but just for Persian. You may need to fine-tune it on your specific task.
Example code:
```
from transformers import T5ForConditionalGeneration,AutoTokenizer
import torch
model_name = "Ahmad/parsT5-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)
input_ids = tokenizer.encode('دانش آموزان به <extra_id_0> میروند و <extra_id_1> میخوانند.', return_tensors='pt')
with torch.no_grad():
hypotheses = model.generate(input_ids)
for h in hypotheses:
print(tokenizer.decode(h))
```
Steps: 725000
Accuracy: 0.66
Training More?
========
To train the model further please refer to its github repository at:
https://github.com/puraminy/parsT5
|