MiniF2F is a dataset of formal Olympiad-level mathematics problems statements intended to provide a unified cross-system benchmark for neural theorem proving. The miniF2F benchmark currently targets Metamath, Lean, and Isabelle and consists of 488 problem statements drawn from the AIME, AMC, and the International Mathematical Olympiad (IMO), as well as material from high-school and undergraduate mathematics courses.
26 PAPERS • 2 BENCHMARKS
MED is a new evaluation dataset that covers a wide range of monotonicity reasoning that was created by crowdsourcing and collected from linguistics publications. The dataset was constructed by collecting naturally-occurring examples by crowdsourcing and well-designed ones from linguistics publications. It consists of 5,382 examples.
18 PAPERS • 1 BENCHMARK
A new large-scale geometry problem-solving dataset - 3,002 multi-choice geometry problems - dense annotations in formal language for the diagrams and text - 27,213 annotated diagram logic forms (literals) - 6,293 annotated text logic forms (literals)
17 PAPERS • 1 BENCHMARK
HolStep is a dataset based on higher-order logic (HOL) proofs, for the purpose of developing new machine learning-based theorem-proving strategies.
10 PAPERS • 2 BENCHMARKS
This relational database consists of 24 unique names in two families (they have equivalent structures).
10 PAPERS • NO BENCHMARKS YET
The official HOList benchmark for automated theorem proving consists of all theorem statements in the core, complex, and flyspeck corpora. The goal of the benchmark is to prove as many theorems as possible in the HOList environment in the order they appear in the database. That is, only theorems that occur before the current theorem are supposed to be used as premises (lemmata) in its proof.
8 PAPERS • 1 BENCHMARK
ProofNet is a benchmark for autoformalization and formal proving of undergraduate-level mathematics. The ProofNet benchmarks consists of 371 examples, each consisting of a formal theorem statement in Lean 3, a natural language theorem statement, and natural language proof. The problems are primarily drawn from popular undergraduate pure mathematics textbooks and cover topics such as real and complex analysis, linear algebra, abstract algebra, and topology.
5 PAPERS • NO BENCHMARKS YET
GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant.
1 PAPER • NO BENCHMARKS YET