TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Long-range modeling	LRA	Linformer	ListOps	35.7	# 24
Long-range modeling	LRA	Linformer	Text	53.94	# 29
Long-range modeling	LRA	Linformer	Retrieval	52.27	# 26
Long-range modeling	LRA	Linformer	Image	38.56	# 28
Long-range modeling	LRA	Linformer	Pathfinder	76.34	# 18
Long-range modeling	LRA	Performer	ListOps	18.01	# 27
Long-range modeling	LRA	Performer	Text	65.4	# 20
Long-range modeling	LRA	Performer	Retrieval	53.82	# 23
Long-range modeling	LRA	Performer	Image	42.77	# 21
Long-range modeling	LRA	Performer	Pathfinder	77.05	# 17
Long-range modeling	LRA	Performer	Avg	51.41	# 24
Long-range modeling	LRA	Local Attention	ListOps	37.27	# 20
Long-range modeling	LRA	Local Attention	Text	56.1	# 28
Long-range modeling	LRA	Local Attention	Retrieval	53.4	# 24
Long-range modeling	LRA	Local Attention	Image	38.07	# 29
Long-range modeling	LRA	Local Attention	Pathfinder	68.5	# 28
Long-range modeling	LRA	Local Attention	Avg	50.67	# 26
Long-range modeling	LRA	BigBird	ListOps	36.05	# 23
Long-range modeling	LRA	BigBird	Text	64.02	# 23
Long-range modeling	LRA	BigBird	Retrieval	59.29	# 19
Long-range modeling	LRA	BigBird	Image	40.83	# 27
Long-range modeling	LRA	BigBird	Pathfinder	74.87	# 20
Long-range modeling	LRA	BigBird	Avg	55.01	# 20
Long-range modeling	LRA	Linear Trans.	ListOps	16.13	# 29
Long-range modeling	LRA	Linear Trans.	Text	65.9	# 19
Long-range modeling	LRA	Linear Trans.	Retrieval	53.09	# 25
Long-range modeling	LRA	Linear Trans.	Image	42.34	# 23
Long-range modeling	LRA	Linear Trans.	Pathfinder	75.3	# 19
Long-range modeling	LRA	Linear Trans.	Avg	50.55	# 27
Long-range modeling	LRA	Sparse Trans.	ListOps	17.07	# 28
Long-range modeling	LRA	Sparse Trans.	Text	63.58	# 24
Long-range modeling	LRA	Sparse Trans.	Retrieval	59.59	# 18
Long-range modeling	LRA	Sparse Trans.	Image	44.24	# 19
Long-range modeling	LRA	Sparse Trans.	Pathfinder	71.71	# 22
Long-range modeling	LRA	Sparse Trans.	Avg	51.24	# 25
Long-range modeling	LRA	Sinkhorn Trans.	ListOps	33.67	# 26
Long-range modeling	LRA	Sinkhorn Trans.	Text	61.2	# 27
Long-range modeling	LRA	Sinkhorn Trans.	Image	41.23	# 26
Long-range modeling	LRA	Sinkhorn Trans.	Pathfinder	67.45	# 29
Long-range modeling	LRA	Synthesizer	ListOps	36.99	# 21
Long-range modeling	LRA	Synthesizer	Text	61.68	# 26
Long-range modeling	LRA	Synthesizer	Retrieval	54.67	# 22
Long-range modeling	LRA	Synthesizer	Image	41.61	# 25
Long-range modeling	LRA	Synthesizer	Pathfinder	69.45	# 27
Long-range modeling	LRA	Synthesizer	Avg	52.88	# 23
Long-range modeling	LRA	Longformer	ListOps	35.63	# 25
Long-range modeling	LRA	Longformer	Text	62.85	# 25
Long-range modeling	LRA	Longformer	Retrieval	56.89	# 21
Long-range modeling	LRA	Longformer	Image	42.22	# 24
Long-range modeling	LRA	Longformer	Pathfinder	69.71	# 26
Long-range modeling	LRA	Longformer	Avg	53.46	# 22
Long-range modeling	LRA	Transformer	ListOps	36.37	# 22
Long-range modeling	LRA	Transformer	Text	64.27	# 22
Long-range modeling	LRA	Transformer	Retrieval	57.46	# 20
Long-range modeling	LRA	Transformer	Image	42.44	# 22
Long-range modeling	LRA	Transformer	Pathfinder	71.4	# 23
Long-range modeling	LRA	Transformer	Avg	54.39	# 21

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/long-range-arena-a-benchmark-for-efficient-1/long-range-modeling-on-lra)](https://paperswithcode.com/sota/long-range-modeling-on-lra?p=long-range-arena-a-benchmark-for-efficient-1)`

Long Range Arena: A Benchmark for Efficient Transformers

8 Nov 2020 · Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler ·

Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity. In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models. To this date, there is no well-established consensus on how to evaluate this class of models. Moreover, inconsistent benchmarking on a wide spectrum of tasks and datasets makes it difficult to assess relative model quality amongst many models. This paper proposes a systematic and unified benchmark, LRA, specifically focused on evaluating model quality under long-context scenarios. Our benchmark is a suite of tasks consisting of sequences ranging from $1K$ to $16K$ tokens, encompassing a wide range of data types and modalities such as text, natural, synthetic images, and mathematical expressions requiring similarity, structural, and visual-spatial reasoning. We systematically evaluate ten well-established long-range Transformer models (Reformers, Linformers, Linear Transformers, Sinkhorn Transformers, Performers, Synthesizers, Sparse Transformers, and Longformers) on our newly proposed benchmark suite. LRA paves the way towards better understanding this class of efficient Transformer models, facilitates more research in this direction, and presents new challenging tasks to tackle. Our benchmark code will be released at https://github.com/google-research/long-range-arena.

PDF Abstract

Code

Add Remove Mark official

google-research/long-range-arena official

682

google-research/bigbird

553

guyd1995/lra-benchmarks

dar-tau/lra-benchmark

guyd1995/lra-benchmark

Tasks

Add Remove

16k

Benchmarking

Long-range modeling

Datasets

Introduced in the Paper:

LRA

Used in the Paper:

IMDb Movie Reviews

ListOps

Results from the Paper

Edit

Ranked #18 on Long-range modeling on LRA (Pathfinder metric)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Long-range modeling	LRA	Linformer	ListOps	35.7	# 24	Compare
			Text	53.94	# 29	Compare
			Retrieval	52.27	# 26	Compare
			Image	38.56	# 28	Compare
			Pathfinder	76.34	# 18	Compare
Long-range modeling	LRA	Performer	ListOps	18.01	# 27	Compare
			Text	65.4	# 20	Compare
			Retrieval	53.82	# 23	Compare
			Image	42.77	# 21	Compare
			Pathfinder	77.05	# 17	Compare
			Avg	51.41	# 24	Compare
Long-range modeling	LRA	Local Attention	ListOps	37.27	# 20	Compare
			Text	56.1	# 28	Compare
			Retrieval	53.4	# 24	Compare
			Image	38.07	# 29	Compare
			Pathfinder	68.5	# 28	Compare
			Avg	50.67	# 26	Compare
Long-range modeling	LRA	BigBird	ListOps	36.05	# 23	Compare
			Text	64.02	# 23	Compare
			Retrieval	59.29	# 19	Compare
			Image	40.83	# 27	Compare
			Pathfinder	74.87	# 20	Compare
			Avg	55.01	# 20	Compare
Long-range modeling	LRA	Linear Trans.	ListOps	16.13	# 29	Compare
			Text	65.9	# 19	Compare
			Retrieval	53.09	# 25	Compare
			Image	42.34	# 23	Compare
			Pathfinder	75.3	# 19	Compare
			Avg	50.55	# 27	Compare
Long-range modeling	LRA	Sparse Trans.	ListOps	17.07	# 28	Compare
			Text	63.58	# 24	Compare
			Retrieval	59.59	# 18	Compare
			Image	44.24	# 19	Compare
			Pathfinder	71.71	# 22	Compare
			Avg	51.24	# 25	Compare
Long-range modeling	LRA	Sinkhorn Trans.	ListOps	33.67	# 26	Compare
			Text	61.2	# 27	Compare
			Image	41.23	# 26	Compare
			Pathfinder	67.45	# 29	Compare
Long-range modeling	LRA	Synthesizer	ListOps	36.99	# 21	Compare
			Text	61.68	# 26	Compare
			Retrieval	54.67	# 22	Compare
			Image	41.61	# 25	Compare
			Pathfinder	69.45	# 27	Compare
			Avg	52.88	# 23	Compare
Long-range modeling	LRA	Longformer	ListOps	35.63	# 25	Compare
			Text	62.85	# 25	Compare
			Retrieval	56.89	# 21	Compare
			Image	42.22	# 24	Compare
			Pathfinder	69.71	# 26	Compare
			Avg	53.46	# 22	Compare
Long-range modeling	LRA	Transformer	ListOps	36.37	# 22	Compare
			Text	64.27	# 22	Compare
			Retrieval	57.46	# 20	Compare
			Image	42.44	# 22	Compare
			Pathfinder	71.4	# 23	Compare
			Avg	54.39	# 21	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Long Range Arena: A Benchmark for Efficient Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove