Document Layout Analysis

36 papers with code • 4 benchmarks • 9 datasets

"Document Layout Analysis is performed to determine physical structure of a document, that is, to determine document components. These document components can consist of single connected components-regions [...] of pixels that are adjacent to form single regions [...] , or group of text lines. A text line is a group of characters, symbols, and words that are adjacent, “relatively close” to each other and through which a straight line can be drawn (usually with horizontal or vertical orientation)." L. O'Gorman, "The document spectrum for page layout analysis," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, no. 11, pp. 1162-1173, Nov. 1993.

Image credit: PubLayNet: largest dataset ever for document layout analysis

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Layout Analysis

Dataset	Best Model	Compare
PubLayNet val	VGT	See all
RVL-CDIP	VisualWordGrid	See all
Document Layout Recognition Challenge test	USYD NLP_CS29-2	See all
Document Layout Recognition Challenge mini-dev	Faster_RCNN	See all

Libraries

Use these libraries to find Document Layout Analysis models and implementations

huggingface/transformers

6 papers

124,889

microsoft/unilm

3 papers

18,315

facebookresearch/data2vec_vision

3 papers

PaddlePaddle/PaddleOCR

2 papers

38,458

See all 8 libraries.

Datasets

Subtasks

MS-SSIM

Most implemented papers

Most implemented Social Latest No code

Training data-efficient image transformers & distillation through attention

facebookresearch/deit • • 23 Dec 2020

In this work, we produce a competitive convolution-free transformer by training on Imagenet only.

Paper
Code

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

microsoft/unilm • • 31 Dec 2019

In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.

Paper
Code

BEiT: BERT Pre-Training of Image Transformers

microsoft/unilm • • ICLR 2022

We first "tokenize" the original image into visual tokens.

Paper
Code

PubLayNet: largest dataset ever for document layout analysis

ibm-aur-nlp/PubLayNet • 16 Aug 2019

Deep neural networks that are developed for computer vision have been proven to be an effective method to analyze layout of document images.

Paper
Code

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

microsoft/unilm • • ACL 2021

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.

Paper
Code

dhSegment: A generic deep-learning approach for document segmentation

dhlab-epfl/dhSegment • • 27 Apr 2018

In recent years there have been multiple successful attempts tackling document processing problems separately by designing task specific hand-tuned strategies.

Paper
Code

Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

dhlab-epfl/dhSegment-text • • 14 Feb 2020

The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration.

Paper
Code

A Large Dataset of Historical Japanese Documents with Complex Layouts

dell-research-harvard/HJDataset • • 18 Apr 2020

Deep learning-based approaches for automatic document layout analysis and content extraction have the potential to unlock rich information trapped in historical documents on a large scale.

Paper
Code

CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images

mdv3101/CDeCNet • • 25 Aug 2020

Localizing page elements/objects such as tables, figures, equations, etc.

Paper
Code

DiT: Self-supervised Pre-training for Document Image Transformer

microsoft/unilm • • 4 Mar 2022

We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.

Paper
Code

Document Layout Analysis

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result