FEVER is a publicly available dataset for fact extraction and verification against textual sources.
415 PAPERS • 2 BENCHMARKS
KILT (Knowledge Intensive Language Tasks) is a benchmark consisting of 11 datasets representing 5 types of tasks:
102 PAPERS • 12 BENCHMARKS
The VitaminC dataset contains more than 450,000 claim-evidence pairs for fact verification and factual consistent generation. Based on over 100,000 revisions to popular Wikipedia pages, and additional "synthetic" revisions.
38 PAPERS • NO BENCHMARKS YET
FEVEROUS (Fact Extraction and VERification Over Unstructured and Structured information) is a fact verification dataset which consists of 87,026 verified claims. Each claim is annotated with evidence in the form of sentences and/or cells from tables in Wikipedia, as well as a label indicating whether this evidence supports, refutes, or does not provide enough information to reach a verdict.
35 PAPERS • NO BENCHMARKS YET
A testbed for commonsense reasoning about entity knowledge, bridging fact-checking about entities with commonsense inferences.
23 PAPERS • NO BENCHMARKS YET
Fact-checking (FC) articles which contains pairs (multimodal tweet and a FC-article) from snopes.com.
20 PAPERS • 1 BENCHMARK
Is a dataset for many-hop evidence extraction and fact verification. It challenges models to extract facts from several Wikipedia articles that are relevant to a claim and classify whether the claim is Supported or Not-Supported by the facts. In HoVer, the claims require evidence to be extracted from as many as four English Wikipedia articles and embody reasoning graphs of diverse shapes.
18 PAPERS • NO BENCHMARKS YET
Fact-checking (FC) articles which contains pairs (multimodal tweet and a FC-article) from politifact.com.
16 PAPERS • 1 BENCHMARK
X-FACT is a large publicly available multilingual dataset for factual verification of naturally existing real-world claims. The dataset contains short statements in 25 languages and is labeled for veracity by expert fact-checkers. The dataset includes a multilingual evaluation benchmark that measures both out-of-domain generalization, and zero-shot capabilities of the multilingual models.
13 PAPERS • 1 BENCHMARK
FaVIQ (Fact Verification from Information-seeking Questions) is a challenging and realistic fact verification dataset that reflects confusions raised by real users. We use the ambiguity in information-seeking questions and their disambiguation, and automatically convert them to true and false claims. These claims are natural, and require a complete understanding of the evidence for verification. FaVIQ serves as a challenging benchmark for natural language understanding, and improves performance in professional fact checking.
12 PAPERS • NO BENCHMARKS YET
A large-scale dataset that consists of 21,184 claims, where each claim is assigned a truthfulness label and ruling statement, with 58,523 pieces of evidence in the form of text and images. It supports the end-to-end multimodal fact-checking and explanation generation, where the input is a claim and a large collection of web sources, including articles, images, videos, and tweets, and the goal is to assess the truthfulness of the claim by retrieving relevant evidence and predicting a truthfulness label (i.e., support, refute and not enough information), and generate a rationalization statement to explain the reasoning and ruling process.
4 PAPERS • NO BENCHMARKS YET
We present a dataset, DANFEVER, intended for claim verification in Danish. The dataset builds upon the task framing of the FEVER fact extraction and verification challenge. DANFEVER can be used for creating models for detecting mis- & disinformation in Danish as well as for verification in multilingual settings.
3 PAPERS • 1 BENCHMARK
FACTIFY is a dataset on multi-modal fact verification. It contains images, textual claim, reference textual documenta and image. The task is to classify the claims into support, not-enough-evidence and refute categories with the help of the supporting data. We aim to combat fake news in the social media era by providing this multi-modal dataset. Factify contains 50,000 claims accompanied with 100,000 images, split into training, validation and test sets.
3 PAPERS • NO BENCHMARKS YET
Intermediate annotations from the FEVER dataset that describe original facts extracted from Wikipedia and the mutations that were applied, yielding the claims in FEVER.
1 PAPER • NO BENCHMARKS YET