BEIR (Benchmarking IR) is a heterogeneous benchmark containing different information retrieval (IR) tasks. Through BEIR, it is possible to systematically study the zero-shot generalization capabilities of multiple neural retrieval approaches.
198 PAPERS • 19 BENCHMARKS
The VitaminC dataset contains more than 450,000 claim-evidence pairs for fact verification and factual consistent generation. Based on over 100,000 revisions to popular Wikipedia pages, and additional "synthetic" revisions.
38 PAPERS • NO BENCHMARKS YET
A large-scale dataset that consists of 21,184 claims, where each claim is assigned a truthfulness label and ruling statement, with 58,523 pieces of evidence in the form of text and images. It supports the end-to-end multimodal fact-checking and explanation generation, where the input is a claim and a large collection of web sources, including articles, images, videos, and tweets, and the goal is to assess the truthfulness of the claim by retrieving relevant evidence and predicting a truthfulness label (i.e., support, refute and not enough information), and generate a rationalization statement to explain the reasoning and ruling process.
4 PAPERS • NO BENCHMARKS YET
CoVERT is a fact-checked corpus of tweets with a focus on the domain of biomedicine and COVID-19-related (mis)information. The corpus consists of 300 tweets, each annotated with medical named entities and relations. Employs a novel crowdsourcing methodology to annotate all tweets with fact-checking labels and supporting evidence, which crowdworkers search for online. This methodology results in moderate inter-annotator agreement.
3 PAPERS • NO BENCHMARKS YET
Stanceosaurus is a corpus of 28,033 tweets in English, Hindi, and Arabic annotated with stance towards 251 misinformation claims. The claims in Stanceosaurus originate from 15 fact-checking sources that cover diverse geographical regions and cultures. Unlike existing stance datasets, it introduces a more fine-grained 5-class labeling strategy with additional subcategories to distinguish implicit stance.
Spiced is a paraphrase dataset of scientific findings annotated for degree of information change. Spiced contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers.
1 PAPER • NO BENCHMARKS YET
The peer-reviewed publication for this dataset has been presented in the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), and can be accessed here: https://arxiv.org/abs/2205.02596. Please cite this when using the dataset.
0 PAPER • NO BENCHMARKS YET
STVD-FC is the largest public dataset on the political content analysis and fact-checking tasks. It consists of more than 1,200 fact-checked claims that have been scraped from a fact-checking service with associated metadata. For the video counterpart, the dataset contains nearly 6,730 TV programs, having a total duration of 6,540 hours, with metadata. These programs have been collected during the 2022 French presidential election with a dedicated workstation and protocol. The dataset is delivered as different parts for accessibility of the 2 TB of data and proper indexes. More information about the STVD-FC dataset can be found into the publication [1].