Tianning commited on
Commit
14e0c09
1 Parent(s): c30b107

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ {}
5
+ ---
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+ This project aims to create a text scanner that converts paper images into machine-readable formats (e.g., Markdown, JSON). It is the son of Nougat, and thus, grandson of Douat.
11
+
12
+ The key idea is to combine the bounding box modality with text, achieving a pixel scan behavior that predicts not only the next token but also the next position.
13
+
14
+ ![Example Image](https://raw.githubusercontent.com/veya2ztn/Lougat/main/images/image.png)
15
+
16
+ The name "Lougat" is a combination of LLama and Nougat. The key idea is nature continues of this paper [LOCR: Location-Guided Transformer for Optical Character Recognition]([[2403.02127\] LOCR: Location-Guided Transformer for Optical Character Recognition (arxiv.org)](https://arxiv.org/abs/2403.02127))
17
+
18
+ Current Branch: The **Flougat** model
19
+
20
+ Other Branch:
21
+ - Florence2 + LLama → Flougat
22
+ - Sam2 + LLama → Slougat
23
+ - Nougat + Relative Position Embedding LLama → Rlougat
24
+
25
+
26
+ # Inference and Train
27
+
28
+ Please see `https://github.com/veya2ztn/Lougat`
29
+
30
+
31
+