🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

25 dataset results for Fake News Detection

LIAR is a publicly available dataset for fake news detection. A decade-long of 12.8K manually labeled short statements were collected in various contexts from POLITIFACT.COM, which provides detailed analysis report and links to source documents for each case. This dataset can be used for fact-checking research as well. Notably, this new dataset is an order of magnitude larger than previously largest public fake news datasets of similar type. The LIAR dataset4 includes 12.8K human labeled short statements from POLITIFACT.COM’s API, and each statement is evaluated by a POLITIFACT.COM editor for its truthfulness.

108 PAPERS • 1 BENCHMARK

RealNews

RealNews is a large corpus of news articles from Common Crawl. Data is scraped from Common Crawl, limited to the 5000 news domains indexed by Google News. The authors used the Newspaper Python library to extract the body and metadata from each article. News from Common Crawl dumps from December 2016 through March 2019 were used as training data; articles published in April 2019 from the April 2019 dump were used for evaluation. After deduplication, RealNews is 120 gigabytes without compression.

75 PAPERS • NO BENCHMARKS YET

Weibo NER

The Weibo NER dataset is a Chinese Named Entity Recognition dataset drawn from the social media website Sina Weibo.

51 PAPERS • 2 BENCHMARKS

FakeNewsNet

FakeNewsNet is collected from two fact-checking websites: GossipCop and PolitiFact containing news contents with labels annotated by professional journalists and experts, along with social context information.

26 PAPERS • NO BENCHMARKS YET

Snopes

Fact-checking (FC) articles which contains pairs (multimodal tweet and a FC-article) from snopes.com.

20 PAPERS • 1 BENCHMARK

FNC-1 (Fake News Challenge Stage 1)

FNC-1 was designed as a stance detection dataset and it contains 75,385 labeled headline and article pairs. The pairs are labelled as either agree, disagree, discuss, and unrelated. Each headline in the dataset is phrased as a statement

18 PAPERS • 2 BENCHMARKS

PolitiFact

Fact-checking (FC) articles which contains pairs (multimodal tweet and a FC-article) from politifact.com.

16 PAPERS • 1 BENCHMARK

COVID-19 Fake News Dataset

COVID-19 Fake News Dataset (COVID19 Fake News Detection in English)

Along with COVID-19 pandemic we are also fighting an `infodemic'. Fake news and rumors are rampant on social media. Believing in rumors can cause significant harm. This is further exacerbated at the time of a pandemic. To tackle this, we curate and release a manually annotated dataset of 10,700 social media posts and articles of real and fake news on COVID-19. We benchmark the annotated dataset with four machine learning baselines - Decision Tree, Logistic Regression , Gradient Boost , and Support Vector Machine (SVM). We obtain the best performance of 93.46\% F1-score with SVM.

11 PAPERS • 1 BENCHMARK

Fakeddit

Fakeddit is a novel multimodal dataset for fake news detection consisting of over 1 million samples from multiple categories of fake news. After being processed through several stages of review, the samples are labeled according to 2-way, 3-way, and 6-way classification categories through distant supervision.

9 PAPERS • NO BENCHMARKS YET

MM-COVID

MM-COVID (Multilingual and Multidimensional COVID-19 Fake News Data Repository)

MM-COVID is a dataset for fake news detection related to COVID-19. This dataset provides the multilingual fake news and the relevant social context. It contains 3,981 pieces of fake news content and 7,192 trustworthy information from English, Spanish, Portuguese, Hindi, French and Italian, 6 different languages.

8 PAPERS • NO BENCHMARKS YET

NELA-GT-2018

NELA-GT-2018 is a dataset for the study of misinformation that consists of 713k articles collected between 02/2018-11/2018. These articles are collected directly from 194 news and media outlets including mainstream, hyper-partisan, and conspiracy sources. It includes ground truth ratings of the sources from 8 different assessment sites covering multiple dimensions of veracity, including reliability, bias, transparency, adherence to journalistic standards, and consumer trust.

8 PAPERS • NO BENCHMARKS YET

Weibo21

Weibo21 is a benchmark of fake news dataset for multi-domain fake news detection (MFND) with domain label annotated, which consists of 4,488 fake news and 4,640 real news from 9 different domains.

8 PAPERS • NO BENCHMARKS YET

UPFD (User Preference-aware Fake News Detection)

For benchmarking, please refer to its variant UPFD-POL and UPFD-GOS.

7 PAPERS • 2 BENCHMARKS

BanFakeNews

An annotated dataset of ~50K news that can be used for building automated fake news detection systems for a low resource language like Bangla.

5 PAPERS • NO BENCHMARKS YET

NELA-GT-2019

NELA-GT-2019 is an updated version of the NELA-GT-2018 dataset. NELA-GT-2019 contains 1.12M news articles from 260 sources collected between January 1st 2019 and December 31st 2019. Just as with NELA-GT-2018, these sources come from a wide range of mainstream news sources and alternative news sources. Included with the dataset are source-level ground truth labels from 7 different assessment sites covering multiple dimensions of veracity.

5 PAPERS • NO BENCHMARKS YET

MuMiN

MuMiN is a misinformation graph dataset containing rich social media data (tweets, replies, users, images, articles, hashtags), spanning 21 million tweets belonging to 26 thousand Twitter threads, each of which have been semantically linked to 13 thousand fact-checked claims across dozens of topics, events and domains, in 41 different languages, spanning more than a decade.

4 PAPERS • 3 BENCHMARKS

NELA-GT-2020

NELA-GT-2020 is an updated version of the NELA-GT-2019 dataset. NELA-GT-2020 contains nearly 1.8M news articles from 519 sources collected between January 1st, 2020 and December 31st, 2020. Just as with NELA-GT-2018 and NELA-GT-2019, these sources come from a wide range of mainstream news sources and alternative news sources. Included in the dataset are source-level ground truth labels from Media Bias/Fact Check (MBFC) covering multiple dimensions of veracity. Additionally, new in the 2020 dataset are the Tweets embedded in the collected news articles, adding an extra layer of information to the data.

4 PAPERS • NO BENCHMARKS YET

Some Like it Hoax

Some Like it Hoax is a fake news detection dataset consisting of 15,500 Facebook posts and 909,236 users.

4 PAPERS • NO BENCHMARKS YET

UPFD-GOS (User Preference-aware Fake News Detection)

The Gossipcop variant of the UPFD dataset for benchmarking.

3 PAPERS • 1 BENCHMARK

AraCOVID19-MFH

AraCOVID19-MFH (AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News and Hate Speech Detection Dataset)

AraCOVID19-MFH is a manually annotated multi-label Arabic COVID-19 fake news and hate speech detection dataset. The dataset contains 10,828 Arabic tweets annotated with 10 different labels.

2 PAPERS • NO BENCHMARKS YET

UPFD-POL (User Preference-aware Fake News Detection)

The PolitiFact variant of the UPFD dataset for benchmarking.

2 PAPERS • 1 BENCHMARK

BanMANI

A Dataset to Identify Manipulated Social Media News in Bangla

1 PAPER • NO BENCHMARKS YET

Fake News Filipino Dataset

Expertly-curated benchmark dataset for fake news detection in Filipino.

1 PAPER • NO BENCHMARKS YET

Twitter MediaEval

Twitter MediaEval (MediaEval Benchmarking Initiative for Multimedia Evaluation)

The task addresses the problem of the appearance and propagation of posts that share misleading multimedia content (images or video). In the context of the task, different types of misleading use are considered:

1 PAPER • NO BENCHMARKS YET

CIDII Dataset (Correct Information and Disinformation about Islamic Issues)

The CIDII dataset is a binary classification, consisting of two classes of correct information and disinformation related to Islamic issues. The CIDII dataset belongs to our research (DISINFORMATION DETECTION ABOUT ISLAMIC ISSUES ON SOCIAL MEDIA USING DEEP LEARNING TECHNIQUES) published in MJCS journal in the link below: https://ejournal.um.edu.my/index.php/MJCS/article/view/41935

0 PAPER • NO BENCHMARKS YET

Datasets

25 dataset results for Fake News Detection