5 dataset results for Filipino

MuMiN is a misinformation graph dataset containing rich social media data (tweets, replies, users, images, articles, hashtags), spanning 21 million tweets belonging to 26 thousand Twitter threads, each of which have been semantically linked to 13 thousand fact-checked claims across dozens of topics, events and domains, in 41 different languages, spanning more than a decade.

4 PAPERS • 3 BENCHMARKS

WikiText-TL-39

WikiText-TL-39 is a benchmark language modeling dataset in Filipino that has 39 million tokens in the training set.

3 PAPERS • NO BENCHMARKS YET

NewsPH-NLI

NewsPH-NLI is a sentence entailment benchmark dataset in the low-resource Filipino language.

2 PAPERS • NO BENCHMARKS YET

Fake News Filipino Dataset

Expertly-curated benchmark dataset for fake news detection in Filipino.

1 PAPER • NO BENCHMARKS YET

BalitaNLP

A Filipino multi-modal language dataset for image-conditional language generation and text-conditional image generation. Consists of 351,755 Filipino news articles gathered from Filipino news outlets.

0 PAPER • NO BENCHMARKS YET

Datasets

5 dataset results for Filipino