document understanding

74 papers with code • 0 benchmarks • 1 datasets

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Libraries

Use these libraries to find document understanding models and implementations

Datasets


Most implemented papers

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

microsoft/unilm 18 Apr 2021

In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

microsoft/unilm ACL 2021

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.

Chargrid: Towards Understanding 2D Documents

sciencefictionlab/chargrid-pytorch EMNLP 2018

We introduce a novel type of text representation that preserves the 2D layout of a document.

OCR-free Document Understanding Transformer

huggingface/transformers 30 Nov 2021

Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.

ICDAR 2021 Competition on Scientific Literature Parsing

ibm-aur-nlp/PubLayNet 8 Jun 2021

Scientific literature contain important information related to cutting-edge innovations in diverse domains.

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

alibabaresearch/advancedliteratemachinery 8 Apr 2024

The core of LayoutLLM is a layout instruction tuning strategy, which is specially designed to enhance the comprehension and utilization of document layouts.

Message Passing Attention Networks for Document Understanding

giannisnik/mpad 17 Aug 2019

In this paper, we represent documents as word co-occurrence networks and propose an application of the message passing framework to NLP, the Message Passing Attention network for Document understanding (MPAD).

MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding

microsoft/unilm 16 Oct 2021

Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images.

LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding

jpwang/lilt ACL 2022

LiLT can be pre-trained on the structured documents of a single language and then directly fine-tuned on other languages with the corresponding off-the-shelf monolingual/multilingual pre-trained textual models.

End-to-end Document Recognition and Understanding with Dessurt

herobd/dessurt 30 Mar 2022

Dessurt is a more flexible model than prior methods and is able to handle a variety of document domains and tasks.