Texts

BIG-bench (Beyond the Imitation Game Benchmark)

Introduced by Srivastava et al. in Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. Big-bench include more than 200 tasks.

Task	Dataset Variant	Best Model
Multi-task Language Understanding	BBH-nlp	Flan-PaLM 540B
Logical Reasoning	BIG-bench (Reasoning About Colored Objects)	PaLM 2
Logical Reasoning	BIG-bench (Penguins In A Table)	PaLM 2
Multiple Choice Question Answering (MCQA)	BIG-bench (Navigate)	PaLM 2
Multiple Choice Question Answering (MCQA)	BIG-bench (Movie Recommendation)	PaLM 2
Multiple Choice Question Answering (MCQA)	BIG-bench (Hyperbaton)	Bloomberg GPT
Logical Reasoning	BIG-bench (Formal Fallacies Syllogisms Negation)	PaLM 2
Common Sense Reasoning	BIG-bench (Date Understanding)	PaLM 2
Common Sense Reasoning	BIG-bench (Causal Judgment)	PaLM 2
Common Sense Reasoning	BIG-bench (Disambiguation QA)	PaLM 2
Logical Reasoning	BIG-bench (Temporal Sequences)	PaLM 2
Multiple Choice Question Answering (MCQA)	BIG-bench (Ruin Names)	PaLM 2
Common Sense Reasoning	BIG-bench (Sports Understanding)	PaLM 2
Sarcasm Detection	BIG-bench (SNARKS)	PaLM 2
Multi-task Language Understanding	BBH-alg	code-davinci-002 175B
Word Sense Disambiguation	BIG-bench (Anachronisms)	Chinchilla-70B
Multiple Choice Question Answering (MCQA)	BIG-bench (Novel Concepts)	PaLM-540B
Crass AI	BIG-bench	Orca 2-13B
Logical Reasoning	BIG-bench (Logic Grid Puzzle)	Chinchilla-70B
Common Sense Reasoning	BIG-bench (Winowhy)	PaLM-540B
Logical Reasoning	BIG-bench (StrategyQA)	PaLM-540B
Memorization	BIG-bench (Hindu Knowledge)	PaLM-540B
Auto Debugging	Big-bench Lite	PaLM 62B
Language Modelling	BIG-bench-lite	GLM-130B
Common Sense Reasoning	BIG-bench (Known Unknowns)	PaLM-540B
Timedial	BIG-bench	Chinchilla-70B
Understanding Fables	BIG-bench	Chinchilla-70B
Dark Humor Detection	BIG-bench	Gopher-280B
Identify Odd Metapor	BIG-bench	Chinchilla-70B
Odd One Out	BIG-bench	Chinchilla-70B
Human Organs Senses Multiple Choice	BIG-bench	Chinchilla-70B
Moral Permissibility	BIG-bench	Chinchilla-70B
Crash Blossom	BIG-bench	Gopher-280B
GRE Reading Comprehension	BIG-bench	Chinchilla-70B
Similarities Abstraction	BIG-bench	Chinchilla-70B
English Proverbs	BIG-bench	Chinchilla-70B
Irony Identification	BIG-bench	Chinchilla-70B
Empirical Judgments	BIG-bench	Chinchilla-70B
Implicatures	BIG-bench	Chinchilla-70B
Discourse Marker Prediction	BIG-bench	Chinchilla-70B
Presuppositions As NLI	BIG-bench	Chinchilla-70B
Physical Intuition	BIG-bench	Chinchilla-70B
Metaphor Boolean	BIG-bench	Chinchilla-70B
Logical Args	BIG-bench	Gopher-280B
Evaluating Information Essentiality	BIG-bench	Chinchilla-70B
Epistemic Reasoning	BIG-bench	Chinchilla-70B
Entailed Polarity	BIG-bench	Chinchilla-70B
Analytic Entailment	BIG-bench	Chinchilla-70B
General Knowledge	BIG-bench	Chinchilla-70B
Sentence Ambiguity	BIG-bench	Chinchilla-70B
Misconceptions	BIG-bench	Chinchilla-70B
Question Selection	BIG-bench	Chinchilla-70B
Phrase Relatedness	BIG-bench	Chinchilla-70B
Nonsense Words Grammar	BIG-bench	Chinchilla-70B
Movie Dialog Same Or Different	BIG-bench	Chinchilla-70B
LAMBADA	BIG-bench	Chinchilla-70B
Intent Recognition	BIG-bench	Chinchilla-70B
Implicit Relations	BIG-bench	Chinchilla-70B
Analogical Similarity	BIG-bench	Chinchilla-70B
Logical Reasoning	BIG-bench (Logical Fallacy Detection)	Chinchilla-70B
Common Sense Reasoning	BIG-bench (Logical Sequence)	Chinchilla-70B
Fantasy Reasoning	BIG-bench	Chinchilla-70B
Figure Of Speech Detection	BIG-bench	Chinchilla-70B
Riddle Sense	BIG-bench	Chinchilla-70B
Physics MC	BIG-bench	Chinchilla-70B
Mathematical Induction	BIG-bench	Gopher-280B
	BIG-bench (Ruin Names)	CoT-T5 11B
	BIG-bench (SNARKS)	CoT-T5 11B
	Big-bench Hard	CoT-T5 11B
Business Ethics	BIG-bench	Gopher-280B
Moral Disputes	BIG-bench	Gopher-280B
Moral Scenarios	BIG-bench	Gopher-280B
FEVER (2-way)	BIG-bench	Gopher-280B
FEVER (3-way)	BIG-bench	Gopher-280B
Global Facts	BIG-bench	Gopher-280B
Miscellaneous	BIG-bench	Gopher-280B
Natural Questions	BIG-bench	Gopher-280B
TriviaQA	BIG-bench	Gopher-280B
High School European History	BIG-bench	Gopher-280B
High School US History	BIG-bench	Gopher-280B
High School World History	BIG-bench	Gopher-280B
International Law	BIG-bench	Gopher-280B
Jurisprudence	BIG-bench	Gopher-280B
Logical Fallacies	BIG-bench	Gopher-280B
Management	BIG-bench	Gopher-280B
Marketing	BIG-bench	Gopher-280B
Philosophy	BIG-bench	Gopher-280B
Prehistory	BIG-bench	Gopher-280B
Professional Law	BIG-bench	Gopher-280B
World Religions	BIG-bench	Gopher-280B
Abstract Algebra	BIG-bench	GAL 30B
College Mathematics	BIG-bench	GAL 120B
Elementary Mathematics	BIG-bench	Chinchilla
Formal Logic	BIG-bench	Gopher-280B
High School Mathematics	BIG-bench	GAL 120B
Professional Accounting	BIG-bench	Gopher-280B
Anatomy	BIG-bench	Gopher-280B
Clinical Knowledge	BIG-bench	Gopher-280B
College Medicine	BIG-bench	Gopher-280B
Human Aging	BIG-bench	Gopher-280B
Medical Genetics	BIG-bench	GAL 30B
Nutrition	BIG-bench	Gopher-280B
Professional Medicine	BIG-bench	Gopher-280B
Virology	BIG-bench	Gopher-280B
RACE-h	BIG-bench	Gopher-280B
RACE-m	BIG-bench	Gopher-280B
Astronomy	BIG-bench	Chinchilla
College Biology	BIG-bench	Chinchilla
College Chemistry	BIG-bench	Chinchilla
College Computer Science	BIG-bench	Chinchilla
College Physics	BIG-bench	Chinchilla
Computer Security	BIG-bench	Gopher-280B
Conceptual Physics	BIG-bench	Gopher-280B
Electrical Engineering	BIG-bench	GAL 120B
High School Biology	BIG-bench	Chinchilla
High School Chemistry	BIG-bench	Chinchilla
High School Computer Science	BIG-bench	GAL 120B
High School Physics	BIG-bench	Chinchilla
High School Statistics	BIG-bench	Chinchilla
BIG-bench Machine Learning	BIG-bench	Gopher-280B
Econometrics	BIG-bench	Gopher-280B
High School Geography	BIG-bench	Gopher-280B
High School Government and Politics	BIG-bench	Gopher-280B
High School Macroeconomics	BIG-bench	Gopher-280B
High School Microeconomics	BIG-bench	Gopher-280B
High School Psychology	BIG-bench	Gopher-280B
Human Sexuality	BIG-bench	Gopher-280B
Professional Psychology	BIG-bench	Gopher-280B
Public Relations	BIG-bench	Gopher-280B
Security Studies	BIG-bench	Gopher-280B
Sociology	BIG-bench	Gopher-280B
US Foreign Policy	BIG-bench	Gopher-280B
	BIG-bench (Hyperbaton)	CoT-T5 11B
	BIG-bench (Navigate)	CoT-T5 11B

BIG-bench (Beyond the Imitation Game Benchmark)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

HELM

BBH

MGSM

GSM8K

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages