MuMiN is a misinformation graph dataset containing rich social media data (tweets, replies, users, images, articles, hashtags), spanning 21 million tweets belonging to 26 thousand Twitter threads, each of which have been semantically linked to 13 thousand fact-checked claims across dozens of topics, events and domains, in 41 different languages, spanning more than a decade.
4 PAPERS • 3 BENCHMARKS
WikiText-TL-39 is a benchmark language modeling dataset in Filipino that has 39 million tokens in the training set.
3 PAPERS • NO BENCHMARKS YET
NewsPH-NLI is a sentence entailment benchmark dataset in the low-resource Filipino language.
2 PAPERS • NO BENCHMARKS YET
Expertly-curated benchmark dataset for fake news detection in Filipino.
1 PAPER • NO BENCHMARKS YET
A Filipino multi-modal language dataset for image-conditional language generation and text-conditional image generation. Consists of 351,755 Filipino news articles gathered from Filipino news outlets.
0 PAPER • NO BENCHMARKS YET