hantian/yolo-doclaynet · Hugging Face

More details refer to Github

Introduction

You know that RAG is very popular these days. There are many applications that support talking to documents. However, there is a huge performance drop when talking to a complex document due to the complex structures. So it's a challenge to extract content from complex document and organize it into parsable form. This repo aims to solve this challenge with a fast and good performance method.

Detection Sample

Method

YOLO is the most advenced detect model developed by Ultralytics. YOLO has 5 different sizes of base model and a super powerful framework for training and deployment. So I chose YOLO to solve this challenge.
DocLayNet is a human-annotated document layout segmentation dataset containing 80863 pages from a broad variety of document sources. As far as I know, it's the most qualified document layout analysis dataset.

Usage

from ultralytics import YOLO

model = YOLO("{path to model file}")
pred = model("{path to test image}")
print(pred)

Dataset

DocLayNet can be found more details and download at this link. It has 11 labels:

Text: Regular paragraphs.
Picture: A graphic or photograph.
Caption: Special text outside a picture or table that introduces this picture or table.
Section-header: Any kind of heading in the text, except overall document title.
Footnote: Typically small text at the bottom of a page, with a number or symbol that is referred to in the text above.
Formula: Mathematical equation on its own line.
Table: Material arranged in a grid alignment with rows and columns, often with separator lines.
List-item: One element of a list, in a hanging shape, i.e., from the second line onwards the paragraph is indented more than the first line.
Page-header: Repeating elements like page number at the top, outside of the normal text flow.
Page-footer: Repeating elements like page number at the bottom, outside of the normal text flow.
Title: Overall title of a document, (almost) exclusively on the first page and typically appearing in large font.

hantian
/

yolo-doclaynet

Introduction

Detection Sample

Method

Usage

Dataset

Dataset used to train hantian/yolo-doclaynet