Update README.md
Browse files
README.md
CHANGED
@@ -9,51 +9,73 @@ metrics:
|
|
9 |
model-index:
|
10 |
- name: nllb200-ar-en
|
11 |
results: []
|
|
|
|
|
|
|
|
|
|
|
12 |
---
|
13 |
|
14 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
15 |
should probably proofread and complete it, then remove this comment. -->
|
16 |
|
17 |
-
# nllb200-ar-en
|
18 |
|
19 |
-
|
20 |
-
|
21 |
-
- Loss: 1.0996
|
22 |
-
- Bleu: 39.5933
|
23 |
|
24 |
-
## Model description
|
25 |
|
26 |
-
|
27 |
|
28 |
-
|
29 |
|
30 |
-
|
31 |
|
32 |
-
|
33 |
|
34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
-
|
37 |
|
38 |
-
|
|
|
|
|
39 |
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
- seed: 42
|
45 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
46 |
-
- lr_scheduler_type: linear
|
47 |
-
- num_epochs: 10
|
48 |
-
- mixed_precision_training: Native AMP
|
49 |
|
50 |
-
|
51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
|
|
|
53 |
|
54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
-
- Transformers 4.35.2
|
57 |
-
- Pytorch 2.1.0+cu118
|
58 |
-
- Datasets 2.15.0
|
59 |
-
- Tokenizers 0.15.0
|
|
|
9 |
model-index:
|
10 |
- name: nllb200-ar-en
|
11 |
results: []
|
12 |
+
datasets:
|
13 |
+
- nadsoft/Arabic-dialect-2-English
|
14 |
+
language:
|
15 |
+
- ar
|
16 |
+
- en
|
17 |
---
|
18 |
|
19 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
20 |
should probably proofread and complete it, then remove this comment. -->
|
21 |
|
|
|
22 |
|
23 |
+
## Model Description
|
24 |
+
Faseeh, an innovative breakthrough in the field of machine translation, specializes in converting Arabic dialects into English. This pre-trained model for machine translation is based on the foundation of advanced language processing techniques. Faseeh not only signifies a remarkable technological feat but also underscores NADSOFT's dedication to enhancing the quality of AI outcomes for Arabic language speakers. This contribution holds particular importance for the Middle East and North Africa (MENA) region and the broader Arab world, aiming to address the distinct linguistic subtleties and cater to the specific requirements of these communities.
|
|
|
|
|
25 |
|
|
|
26 |
|
27 |
+
## Intended Uses & Limitations
|
28 |
|
29 |
+
Faseeh is currently in the developmental phase, and users should be mindful of its inherent limitations. For instance, the model may encounter challenges accurately translating text from speakers with strong accents, such as Moroccan Arabic. Additionally, Faseeh may face difficulties in transcribing text from recordings with significant background noise.
|
30 |
|
31 |
+
It's crucial to acknowledge that Faseeh is not flawless and, therefore, should not be relied upon to generate text for use in contexts involving legal, medical, or other sensitive matters.
|
32 |
|
33 |
+
**Furthermore, it's important to highlight that Faseeh is not yet equipped to handle all dialects. While it supports Modern Standard Arabic (MSA), Egyptian, Levantine, Algeria, and Moroccan, work is underway to include support for other dialects in the near future. Users are advised to consider these limitations when utilizing Faseeh for their specific language translation needs.**
|
34 |
|
35 |
+
## Training and evaluation
|
36 |
+
> Before fine-tunning
|
37 |
+
> ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cf5b1ee9cac0020b8dd54d/xUYo3XGZ3m09ssmEgGhNj.png)
|
38 |
+
<br>
|
39 |
+
> After fine-tunning
|
40 |
+
> ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cf5b1ee9cac0020b8dd54d/fniG0t-KTzwaxk1JUcjcZ.png)
|
41 |
+
<br>
|
42 |
+
## How To Use
|
43 |
+
```python
|
44 |
+
# Load model directly
|
45 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
46 |
+
model_c = 'nadsoft/Faseeh-v0.1-beta'
|
47 |
+
tokenizer = AutoTokenizer.from_pretrained(model_c,)
|
48 |
+
model = AutoModelForSeq2SeqLM.from_pretrained(model_c)
|
49 |
|
50 |
+
# use the pipe
|
51 |
|
52 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
|
53 |
+
text = 'ููุง ุงููุต ุงููู ุงูุช ุนุงูุฒ ุชุชุฑุฌู
ู'
|
54 |
+
translator = pipeline('translation', model=model, tokenizer=tokenizer, src_lang='ajp_Arab', tgt_lang='eng_Latn', max_length = 400)
|
55 |
|
56 |
+
output = translator(text)
|
57 |
+
translated_text = output[0]['translation_text']
|
58 |
+
print(translated_text)
|
59 |
+
# out put ===> Here is the text that you want to translate
|
|
|
|
|
|
|
|
|
|
|
60 |
|
61 |
+
#use the model
|
62 |
|
63 |
+
#translate from Arabic to English
|
64 |
+
text = "ููุฏ ุงููููู
ููู ุงูู ุชููู
ู ุงููููู
ููู ูุงูุ ุงููุงุณ ุงูู ุจููุชุจูุง ุฑุฃููู
ุนู ุงูุงููุงู
ุจููุญูู ุนููู
ููุงุฏ ุงูุงููุงู
."
|
65 |
+
inputs = tokenizer(text, return_tensors="pt")
|
66 |
+
outputs = model.generate(**inputs, max_length=128, num_beams=4, early_stopping=True)
|
67 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
68 |
+
# out put ===> Movie criticism is the evaluation of a movie as it was, people who write their opinion about movies are talked about by movie critics.
|
69 |
+
```
|
70 |
+
# Examples
|
71 |
|
72 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cf5b1ee9cac0020b8dd54d/bwF8CFQbSUT56TDaOgXgq.png)
|
73 |
|
74 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cf5b1ee9cac0020b8dd54d/I3FARzFQNqoExMi5SOE1y.png)
|
75 |
+
|
76 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cf5b1ee9cac0020b8dd54d/32Oq7gugNp2-2f7311Ex0.png)
|
77 |
+
|
78 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cf5b1ee9cac0020b8dd54d/dyBkeGilkWFKlSU2_LPYH.png)
|
79 |
+
|
80 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cf5b1ee9cac0020b8dd54d/yztE4Szw1Ci_fW1aL5HI1.png)
|
81 |
|
|
|
|
|
|
|
|