Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
|
3 |
+
# Doc / guide: https://huggingface.co/docs/hub/model-cards
|
4 |
+
{}
|
5 |
+
---
|
6 |
+
# Model Card for Model ID
|
7 |
+
|
8 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
+
|
10 |
+
This project aims to create a text scanner that converts paper images into machine-readable formats (e.g., Markdown, JSON). It is the son of Nougat, and thus, grandson of Douat.
|
11 |
+
|
12 |
+
The key idea is to combine the bounding box modality with text, achieving a pixel scan behavior that predicts not only the next token but also the next position.
|
13 |
+
|
14 |
+
![Example Image](https://raw.githubusercontent.com/veya2ztn/Lougat/main/images/image.png)
|
15 |
+
|
16 |
+
The name "Lougat" is a combination of LLama and Nougat. The key idea is nature continues of this paper [LOCR: Location-Guided Transformer for Optical Character Recognition]([[2403.02127\] LOCR: Location-Guided Transformer for Optical Character Recognition (arxiv.org)](https://arxiv.org/abs/2403.02127))
|
17 |
+
|
18 |
+
Current Branch: The **Flougat** model
|
19 |
+
|
20 |
+
Other Branch:
|
21 |
+
- Florence2 + LLama → Flougat
|
22 |
+
- Sam2 + LLama → Slougat
|
23 |
+
- Nougat + Relative Position Embedding LLama → Rlougat
|
24 |
+
|
25 |
+
|
26 |
+
# Inference and Train
|
27 |
+
|
28 |
+
Please see `https://github.com/veya2ztn/Lougat`
|
29 |
+
|
30 |
+
|
31 |
+
|