Document Image Classification
24 papers with code • 8 benchmarks • 4 datasets
Document image classification is the task of classifying documents based on images of their contents.
( Image credit: Real-Time Document Image Classification using Deep CNN and Extreme Learning Machines )
Libraries
Use these libraries to find Document Image Classification models and implementationsMost implemented papers
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited.
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.
Training data-efficient image transformers & distillation through attention
In this work, we produce a competitive convolution-free transformer by training on Imagenet only.
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.
BEiT: BERT Pre-Training of Image Transformers
We first "tokenize" the original image into visual tokens.
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.
Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification
We present an exhaustive investigation of recent Deep Learning architectures, algorithms, and strategies for the task of document image classification to finally reduce the error by more than half.
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.
OCR-free Document Understanding Transformer
Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.
Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks
In this work, a region-based Deep Convolutional Neural Network framework is proposed for document structure learning.