TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Long-range modeling	LRA	S4	ListOps	58.35	# 12
Long-range modeling	LRA	S4	Text	76.02	# 16
Long-range modeling	LRA	S4	Retrieval	87.09	# 12
Long-range modeling	LRA	S4	Image	87.26	# 10
Long-range modeling	LRA	S4	Pathfinder	86.05	# 12
Long-range modeling	LRA	S4	Avg	80.48	# 12
Long-range modeling	LRA	S4	Pathfinder-X	88.1	# 11
Sequential Image Classification	Sequential CIFAR-10	S4	Unpermuted Accuracy	91.80%	# 2
Sequential Image Classification	Sequential MNIST	S4	Unpermuted Accuracy	99.63%	# 2
Sequential Image Classification	Sequential MNIST	S4	Permuted Accuracy	98.70%	# 4
Speech Recognition	Speech Commands	S4	Accuracy (%)	98.32	# 2
Language Modelling	WikiText-103	S4	Test perplexity	21.28	# 45
Language Modelling	WikiText-103	S4	Number of params	249M	# 18

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficiently-modeling-long-sequences-with-1/sequential-image-classification-on-sequential-1)](https://paperswithcode.com/sota/sequential-image-classification-on-sequential-1?p=efficiently-modeling-long-sequences-with-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficiently-modeling-long-sequences-with-1/speech-recognition-on-speech-commands-2)](https://paperswithcode.com/sota/speech-recognition-on-speech-commands-2?p=efficiently-modeling-long-sequences-with-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficiently-modeling-long-sequences-with-1/sequential-image-classification-on-sequential)](https://paperswithcode.com/sota/sequential-image-classification-on-sequential?p=efficiently-modeling-long-sequences-with-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficiently-modeling-long-sequences-with-1/long-range-modeling-on-lra)](https://paperswithcode.com/sota/long-range-modeling-on-lra?p=efficiently-modeling-long-sequences-with-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficiently-modeling-long-sequences-with-1/language-modelling-on-wikitext-103)](https://paperswithcode.com/sota/language-modelling-on-wikitext-103?p=efficiently-modeling-long-sequences-with-1)`

Efficiently Modeling Long Sequences with Structured State Spaces

ICLR 2022 · Albert Gu, Karan Goel, Christopher Ré ·

A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies. Although conventional models including RNNs, CNNs, and Transformers have specialized variants for capturing long dependencies, they still struggle to scale to very long sequences of $10000$ or more steps. A promising recent approach proposed modeling sequences by simulating the fundamental state space model (SSM) $ x'(t) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t) $, and showed that for appropriate choices of the state matrix $ A $, this system could handle long-range dependencies mathematically and empirically. However, this method has prohibitive computation and memory requirements, rendering it infeasible as a general sequence modeling solution. We propose the Structured State Space sequence model (S4) based on a new parameterization for the SSM, and show that it can be computed much more efficiently than prior approaches while preserving their theoretical strengths. Our technique involves conditioning $ A $ with a low-rank correction, allowing it to be diagonalized stably and reducing the SSM to the well-studied computation of a Cauchy kernel. S4 achieves strong empirical results across a diverse range of established benchmarks, including (i) 91\% accuracy on sequential CIFAR-10 with no data augmentation or auxiliary losses, on par with a larger 2-D ResNet, (ii) substantially closing the gap to Transformers on image and language modeling tasks, while performing generation $60\times$ faster (iii) SoTA on every task from the Long Range Arena benchmark, including solving the challenging Path-X task of length 16k that all prior work fails on, while being as efficient as all competitors.

PDF Abstract ICLR 2022 PDF ICLR 2022 Abstract

Code

Add Remove Mark official

hazyresearch/state-spaces official

2,111

state-spaces/s4

2,108

srush/annotated-s4

433

lindermanlab/S5

214

ag1988/dss

See all 9 implementations

Tasks

Add Remove

16k

Data Augmentation

Language Modelling

Long-range modeling

Sequential Image Classification

Speech Recognition

Datasets

MNIST

WikiText-2

WikiText-103

Speech Commands LRA

Results from the Paper

Edit

Ranked #2 on Sequential Image Classification on Sequential CIFAR-10

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Long-range modeling	LRA	S4	ListOps	58.35	# 12	Compare
			Text	76.02	# 16	Compare
			Retrieval	87.09	# 12	Compare
			Image	87.26	# 10	Compare
			Pathfinder	86.05	# 12	Compare
			Avg	80.48	# 12	Compare
			Pathfinder-X	88.1	# 11	Compare
Sequential Image Classification	Sequential CIFAR-10	S4	Unpermuted Accuracy	91.80%	# 2	Compare
Sequential Image Classification	Sequential MNIST	S4	Unpermuted Accuracy	99.63%	# 2	Compare
Sequential Image Classification	Sequential MNIST	S4	Permuted Accuracy	98.70%	# 4	Compare
Speech Recognition	Speech Commands	S4	Accuracy (%)	98.32	# 2	Compare
Language Modelling	WikiText-103	S4	Test perplexity	21.28	# 45	Compare
Language Modelling	WikiText-103	S4	Number of params	249M	# 18	Compare

Methods

Add Remove

1x1 Convolution • Average Pooling • Batch Normalization • Bottleneck Residual Block • Convolution • Global Average Pooling • Kaiming Initialization • Max Pooling • ReLU • Residual Block • Residual Connection • ResNet

Edit Social Preview

Efficiently Modeling Long Sequences with Structured State Spaces

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove