document understanding
74 papers with code • 0 benchmarks • 1 datasets
Document understanding involves document classification, layout analysis, information extraction, and DocQA.
Benchmarks
These leaderboards are used to track progress in document understanding
Libraries
Use these libraries to find document understanding models and implementationsMost implemented papers
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.
Chargrid: Towards Understanding 2D Documents
We introduce a novel type of text representation that preserves the 2D layout of a document.
OCR-free Document Understanding Transformer
Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.
ICDAR 2021 Competition on Scientific Literature Parsing
Scientific literature contain important information related to cutting-edge innovations in diverse domains.
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
The core of LayoutLLM is a layout instruction tuning strategy, which is specially designed to enhance the comprehension and utilization of document layouts.
Message Passing Attention Networks for Document Understanding
In this paper, we represent documents as word co-occurrence networks and propose an application of the message passing framework to NLP, the Message Passing Attention network for Document understanding (MPAD).
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding
Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images.
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
LiLT can be pre-trained on the structured documents of a single language and then directly fine-tuned on other languages with the corresponding off-the-shelf monolingual/multilingual pre-trained textual models.
End-to-end Document Recognition and Understanding with Dessurt
Dessurt is a more flexible model than prior methods and is able to handle a variety of document domains and tasks.