TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Language Modelling	One Billion Word	OmniNetP (Large)	PPL	21.6	# 2
Language Modelling	One Billion Word	OmniNetP (Large)	Number of params	100M	# 21
Language Modelling	One Billion Word	OmniNetB (Large)	PPL	22	# 4
Language Modelling	One Billion Word	OmniNetT (Large)	PPL	21.5	# 1
Language Modelling	One Billion Word	OmniNetT (Large)	Number of params	100M	# 21
Machine Translation	WMT2014 English-French	OmniNetP	BLEU score	42.6	# 17
Machine Translation	WMT2014 English-French	OmniNetP	Hardware Burden	None	# 1
Machine Translation	WMT2014 English-French	OmniNetP	Operations per network pass	None	# 1
Machine Translation	WMT2014 English-German	OmniNetP	BLEU score	29.8	# 16
Machine Translation	WMT2014 English-German	OmniNetP	Hardware Burden	None	# 1
Machine Translation	WMT2014 English-German	OmniNetP	Operations per network pass	None	# 1
Machine Translation	WMT2017 Chinese-English	OmniNetP	BLEU	23.0	# 3
Machine Translation	WMT2017 English-Finnish	OmniNetP	BLEU	20.9	# 1
Machine Translation	WMT2017 English-French	OmniNetP	BLEU	43.1	# 1
Machine Translation	WMT2017 English-German	OmniNetP	BLEU	29.0	# 1
Machine Translation	WMT2017 Russian-English	OmniNetP	BLEU	36.2	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/omninet-omnidirectional-representations-from/language-modelling-on-one-billion-word)](https://paperswithcode.com/sota/language-modelling-on-one-billion-word?p=omninet-omnidirectional-representations-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/omninet-omnidirectional-representations-from/machine-translation-on-wmt2017-english)](https://paperswithcode.com/sota/machine-translation-on-wmt2017-english?p=omninet-omnidirectional-representations-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/omninet-omnidirectional-representations-from/machine-translation-on-wmt2017-english-french)](https://paperswithcode.com/sota/machine-translation-on-wmt2017-english-french?p=omninet-omnidirectional-representations-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/omninet-omnidirectional-representations-from/machine-translation-on-wmt2017-english-german)](https://paperswithcode.com/sota/machine-translation-on-wmt2017-english-german?p=omninet-omnidirectional-representations-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/omninet-omnidirectional-representations-from/machine-translation-on-wmt2017-russian)](https://paperswithcode.com/sota/machine-translation-on-wmt2017-russian?p=omninet-omnidirectional-representations-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/omninet-omnidirectional-representations-from/machine-translation-on-wmt2017-chinese)](https://paperswithcode.com/sota/machine-translation-on-wmt2017-chinese?p=omninet-omnidirectional-representations-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/omninet-omnidirectional-representations-from/machine-translation-on-wmt2014-english-german)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-german?p=omninet-omnidirectional-representations-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/omninet-omnidirectional-representations-from/machine-translation-on-wmt2014-english-french)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-french?p=omninet-omnidirectional-representations-from)`

OmniNet: Omnidirectional Representations from Transformers

1 Mar 2021 · Yi Tay, Mostafa Dehghani, Vamsi Aribandi, Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Donald Metzler ·

This paper proposes Omnidirectional Representations from Transformers (OmniNet). In OmniNet, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network. This process can also be interpreted as a form of extreme or intensive attention mechanism that has the receptive field of the entire width and depth of the network. To this end, the omnidirectional attention is learned via a meta-learner, which is essentially another self-attention based model. In order to mitigate the computationally expensive costs of full receptive field attention, we leverage efficient self-attention models such as kernel-based (Choromanski et al.), low-rank attention (Wang et al.) and/or Big Bird (Zaheer et al.) as the meta-learner. Extensive experiments are conducted on autoregressive language modeling (LM1B, C4), Machine Translation, Long Range Arena (LRA), and Image Recognition. The experiments show that OmniNet achieves considerable improvements across these tasks, including achieving state-of-the-art performance on LM1B, WMT'14 En-De/En-Fr, and Long Range Arena. Moreover, using omnidirectional representation in Vision Transformers leads to significant improvements on image recognition tasks on both few-shot learning and fine-tuning setups.

PDF Abstract

Code

Add Remove Mark official

lucidrains/omninet-pytorch

Tasks

Add Remove

Few-Shot Learning

Language Modelling

Machine Translation

Translation

Datasets

CIFAR-10

CIFAR-100

Oxford 102 Flower

WMT 2014 LRA Billion Word Benchmark

Results from the Paper

Edit

Ranked #1 on Machine Translation on WMT2017 Russian-English

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Language Modelling	One Billion Word	OmniNetP (Large)	PPL	21.6	# 2	Compare
Language Modelling	One Billion Word	OmniNetP (Large)	Number of params	100M	# 21	Compare
Language Modelling	One Billion Word	OmniNetB (Large)	PPL	22	# 4	Compare
Language Modelling	One Billion Word	OmniNetT (Large)	PPL	21.5	# 1	Compare
Language Modelling	One Billion Word	OmniNetT (Large)	Number of params	100M	# 21	Compare
Machine Translation	WMT2014 English-French	OmniNetP	BLEU score	42.6	# 17	Compare
			Hardware Burden	None	# 1	Compare
			Operations per network pass	None	# 1	Compare
Machine Translation	WMT2014 English-German	OmniNetP	BLEU score	29.8	# 16	Compare
			Hardware Burden	None	# 1	Compare
			Operations per network pass	None	# 1	Compare
Machine Translation	WMT2017 Chinese-English	OmniNetP	BLEU	23.0	# 3	Compare
Machine Translation	WMT2017 English-Finnish	OmniNetP	BLEU	20.9	# 1	Compare
Machine Translation	WMT2017 English-French	OmniNetP	BLEU	43.1	# 1	Compare
Machine Translation	WMT2017 English-German	OmniNetP	BLEU	29.0	# 1	Compare
Machine Translation	WMT2017 Russian-English	OmniNetP	BLEU	36.2	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

OmniNet: Omnidirectional Representations from Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove