Ahmed107 commited on
Commit
4676983
โ€ข
1 Parent(s): 1c56a63

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -28
README.md CHANGED
@@ -9,51 +9,73 @@ metrics:
9
  model-index:
10
  - name: nllb200-ar-en
11
  results: []
 
 
 
 
 
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
- # nllb200-ar-en
18
 
19
- This model is a fine-tuned version of [facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) on an unknown dataset.
20
- It achieves the following results on the evaluation set:
21
- - Loss: 1.0996
22
- - Bleu: 39.5933
23
 
24
- ## Model description
25
 
26
- More information needed
27
 
28
- ## Intended uses & limitations
29
 
30
- More information needed
31
 
32
- ## Training and evaluation data
33
 
34
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- ## Training procedure
37
 
38
- ### Training hyperparameters
 
 
39
 
40
- The following hyperparameters were used during training:
41
- - learning_rate: 2e-05
42
- - train_batch_size: 32
43
- - eval_batch_size: 64
44
- - seed: 42
45
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
46
- - lr_scheduler_type: linear
47
- - num_epochs: 10
48
- - mixed_precision_training: Native AMP
49
 
50
- ### Training results
51
 
 
 
 
 
 
 
 
 
52
 
 
53
 
54
- ### Framework versions
 
 
 
 
 
 
55
 
56
- - Transformers 4.35.2
57
- - Pytorch 2.1.0+cu118
58
- - Datasets 2.15.0
59
- - Tokenizers 0.15.0
 
9
  model-index:
10
  - name: nllb200-ar-en
11
  results: []
12
+ datasets:
13
+ - nadsoft/Arabic-dialect-2-English
14
+ language:
15
+ - ar
16
+ - en
17
  ---
18
 
19
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
20
  should probably proofread and complete it, then remove this comment. -->
21
 
 
22
 
23
+ ## Model Description
24
+ Faseeh, an innovative breakthrough in the field of machine translation, specializes in converting Arabic dialects into English. This pre-trained model for machine translation is based on the foundation of advanced language processing techniques. Faseeh not only signifies a remarkable technological feat but also underscores NADSOFT's dedication to enhancing the quality of AI outcomes for Arabic language speakers. This contribution holds particular importance for the Middle East and North Africa (MENA) region and the broader Arab world, aiming to address the distinct linguistic subtleties and cater to the specific requirements of these communities.
 
 
25
 
 
26
 
27
+ ## Intended Uses & Limitations
28
 
29
+ Faseeh is currently in the developmental phase, and users should be mindful of its inherent limitations. For instance, the model may encounter challenges accurately translating text from speakers with strong accents, such as Moroccan Arabic. Additionally, Faseeh may face difficulties in transcribing text from recordings with significant background noise.
30
 
31
+ It's crucial to acknowledge that Faseeh is not flawless and, therefore, should not be relied upon to generate text for use in contexts involving legal, medical, or other sensitive matters.
32
 
33
+ **Furthermore, it's important to highlight that Faseeh is not yet equipped to handle all dialects. While it supports Modern Standard Arabic (MSA), Egyptian, Levantine, Algeria, and Moroccan, work is underway to include support for other dialects in the near future. Users are advised to consider these limitations when utilizing Faseeh for their specific language translation needs.**
34
 
35
+ ## Training and evaluation
36
+ > Before fine-tunning
37
+ > ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cf5b1ee9cac0020b8dd54d/xUYo3XGZ3m09ssmEgGhNj.png)
38
+ <br>
39
+ > After fine-tunning
40
+ > ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cf5b1ee9cac0020b8dd54d/fniG0t-KTzwaxk1JUcjcZ.png)
41
+ <br>
42
+ ## How To Use
43
+ ```python
44
+ # Load model directly
45
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
46
+ model_c = 'nadsoft/Faseeh-v0.1-beta'
47
+ tokenizer = AutoTokenizer.from_pretrained(model_c,)
48
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_c)
49
 
50
+ # use the pipe
51
 
52
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
53
+ text = 'ู‡ู†ุง ุงู„ู†ุต ุงู„ู„ูŠ ุงู†ุช ุนุงูˆุฒ ุชุชุฑุฌู…ู‡'
54
+ translator = pipeline('translation', model=model, tokenizer=tokenizer, src_lang='ajp_Arab', tgt_lang='eng_Latn', max_length = 400)
55
 
56
+ output = translator(text)
57
+ translated_text = output[0]['translation_text']
58
+ print(translated_text)
59
+ # out put ===> Here is the text that you want to translate
 
 
 
 
 
60
 
61
+ #use the model
62
 
63
+ #translate from Arabic to English
64
+ text = "ู†ู‚ุฏ ุงู„ููŠู„ู… ู‡ูˆู‡ ุงู†ูƒ ุชู‚ูŠู…ูŠ ุงู„ููŠู„ู… ูƒูŠู ูƒุงู†ุŒ ุงู„ู†ุงุณ ุงู„ูŠ ุจูŠูƒุชุจูˆุง ุฑุฃูŠู‡ู… ุนู† ุงู„ุงูู„ุงู… ุจูŠู†ุญูƒูŠ ุนู†ู‡ู… ู†ู‚ุงุฏ ุงู„ุงูู„ุงู…."
65
+ inputs = tokenizer(text, return_tensors="pt")
66
+ outputs = model.generate(**inputs, max_length=128, num_beams=4, early_stopping=True)
67
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
68
+ # out put ===> Movie criticism is the evaluation of a movie as it was, people who write their opinion about movies are talked about by movie critics.
69
+ ```
70
+ # Examples
71
 
72
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cf5b1ee9cac0020b8dd54d/bwF8CFQbSUT56TDaOgXgq.png)
73
 
74
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cf5b1ee9cac0020b8dd54d/I3FARzFQNqoExMi5SOE1y.png)
75
+
76
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cf5b1ee9cac0020b8dd54d/32Oq7gugNp2-2f7311Ex0.png)
77
+
78
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cf5b1ee9cac0020b8dd54d/dyBkeGilkWFKlSU2_LPYH.png)
79
+
80
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cf5b1ee9cac0020b8dd54d/yztE4Szw1Ci_fW1aL5HI1.png)
81