File size: 5,123 Bytes
cc8512b 65c3888 6d8361a adc5ef6 cc8512b 3e40121 cc8512b 3e40121 6fa0555 51472ac 6fa0555 a2c7e96 3e40121 a2c7e96 3e40121 cc8512b 3e40121 cc8512b 3ae4410 6fa0555 3e40121 48741d4 65c3888 8801de4 cc8512b 8801de4 66e0337 65c3888 cc8512b 8801de4 cc8512b 4ea8c70 3e40121 4ea8c70 65c3888 3e40121 65c3888 3e40121 65c3888 3e40121 65c3888 cc8512b 65c3888 48741d4 cc8512b 3e40121 cc8512b a2c7e96 cc8512b 3e40121 cc8512b a2c7e96 cc8512b 3e40121 cc8512b 65c3888 cc8512b 3e40121 a2c7e96 3e40121 a2c7e96 3e40121 a2c7e96 3e40121 a2c7e96 3e40121 a2c7e96 3e40121 48741d4 a2c7e96 3e40121 a2c7e96 adc5ef6 3e40121 adc5ef6 3e40121 adc5ef6 48741d4 3e40121 a2c7e96 3e40121 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
---
library_name: transformers
license: gpl-3.0
language:
- ar
- en
pipeline_tag: image-to-text
pretty_name: Arabic Small Nougat
datasets:
- Fakhraddin/khatt
base_model:
- facebook/nougat-small
---
# Arabic Small Nougat
**En**d-**t**o-**En**d **Structur**ed **OC**R **fo**r **Arab**ic **boo**ks.
<center>
<img src="https://huggingface.co/MohamedRashad/arabic-small-nougat/resolve/main/thumbnail_image.jpg">
</center>
## Description
The arabic-small-nougat OCR is an end-to-end structured Optical Character Recognition (OCR) system designed specifically for the Arabic language.
The model is based on the [facebook/nougat-small](https://huggingface.co/facebook/nougat-small) architecture and has been fine-tuned using the [Khatt dataset](https://huggingface.co/datasets/Fakhraddin/khatt) along with a custom dataset created for this purpose.
## How to Get Started with the Model
**Demo:** https://huggingface.co/spaces/MohamedRashad/Arabic-Nougat
Or, use the code below to get started with the model locally.
```python
from PIL import Image
import torch
from transformers import NougatProcessor, VisionEncoderDecoderModel
# Load the model and processor
processor = NougatProcessor.from_pretrained("MohamedRashad/arabic-small-nougat")
model = VisionEncoderDecoderModel.from_pretrained("MohamedRashad/arabic-small-nougat")
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
context_length = 2048
def predict(img_path):
# prepare PDF image for the model
image = Image.open(img_path)
pixel_values = processor(image, return_tensors="pt").pixel_values
# generate transcription
outputs = model.generate(
pixel_values.to(device),
min_length=1,
max_new_tokens=context_length,
bad_words_ids=[[processor.tokenizer.unk_token_id]],
)
page_sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
page_sequence = processor.post_process_generation(page_sequence, fix_markdown=False)
return page_sequence
print(predict("path/to/page_image.jpg"))
```
## Bias, Risks, and Limitations
1. **Text Hallucination:** The model may occasionally generate repeated or incorrect text due to the inherent complexities of OCR tasks.
1. **Erroneous Image Paths:** There are instances where the model outputs image paths that are not relevant to the input, indicating occasional confusion.
1. **Context Length Constraint:** The model has a maximum context length of 2048 tokens, which may result in incomplete transcriptions for longer book pages.
## Intended Use
The arabic-small-nougat OCR is designed for tasks that involve converting images of Arabic book pages into structured text, especially when Markdown format is desired. It is suitable for applications in the field of digitizing Arabic literature and facilitating text extraction from printed materials.
## Ethical Considerations
It is crucial to be aware of the model's limitations, particularly in instances where accurate OCR results are critical. Users are advised to verify and review the output, especially in scenarios where precision is paramount.
## Model Details
- **Developed by:** Mohamed Rashad
- **Model type:** VisionEncoderDecoderModel
- **Language(s) (NLP):** Arabic & English
- **License:** GPL 3.0
- **Finetuned from model:** [nougat-small](https://huggingface.co/facebook/nougat-small)
## Acknowledgment
If you use or build upon the Arabic Small Nougat OCR, please acknowledge the model developer and the open-source community for their contributions. Additionally, be sure to include a copy of the GPL 3.0 license with any redistributed or modified versions of the model.
By selecting the GPL 3.0 license, you promote the principles of open source and ensure that the benefits of the model are shared with the broader community.
### Citation
If you find this model useful, please consider citing the original [facebook/nougat-small]((https://huggingface.co/facebook/nougat-small)) model and the datasets used for fine-tuning, including the [Khatt dataset](https://huggingface.co/datasets/Fakhraddin/khatt) and any details regarding the custom dataset.
```bibtex
@misc{blecher2023nougat,
title={Nougat: Neural Optical Understanding for Academic Documents},
author={Lukas Blecher and Guillem Cucurull and Thomas Scialom and Robert Stojnic},
year={2023},
eprint={2308.13418},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{fakhraddin2023khatt,
title={Khatt Arabic Handwriting Dataset},
author={Fakhraddin},
year={2023},
howpublished={\url{https://huggingface.co/datasets/Fakhraddin/khatt}}
}
@misc{rashad2023arabicsmallnougat,
title={Arabic Small Nougat Model},
author={Mohamed Rashad},
year={2023},
howpublished={\url{https://huggingface.co/MohamedRashad/arabic-small-nougat}}
}
```
### Disclaimer
The arabic-small-nougat OCR is a tool provided "as is," and the developers make no guarantees regarding its suitability for specific tasks. Users are encouraged to thoroughly evaluate the model's output for their particular use cases and requirements. |