The WikiQA corpus is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering. In order to reflect the true information need of general users, Bing query logs were used as the question source. Each question is linked to a Wikipedia page that potentially has the answer. Because the summary section of a Wikipedia page provides the basic and usually most important information about the topic, sentences in this section were used as the candidate answers. The corpus includes 3,047 questions and 29,258 sentences, where 1,473 sentences were labeled as answer sentences to their corresponding questions.
189 PAPERS • 2 BENCHMARKS
Text Retrieval Conference Question Answering (TrecQA) is a dataset created from the TREC-8 (1999) to TREC-13 (2004) Question Answering tracks. There are two versions of TrecQA: raw and clean. Both versions have the same training set but their development and test sets differ. The commonly used clean version of the dataset excludes questions in development and test sets with no answers or only positive/negative answers. The clean version has 1,229/65/68 questions and 53,417/1,117/1,442 question-answer pairs for the train/dev/test split.
72 PAPERS • 3 BENCHMARKS
Ubuntu Dialogue Corpus (UDC) is a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This provides a unique resource for research into building dialogue managers based on neural language models that can make use of large amounts of unlabeled data. The dataset has both the multi-turn property of conversations in the Dialog State Tracking Challenge datasets, and the unstructured nature of interactions from microblog services such as Twitter.
44 PAPERS • 8 BENCHMARKS
InsuranceQA is a question answering dataset for the insurance domain, the data stemming from the website Insurance Library. There are 12,889 questions and 21,325 answers in the training set. There are 2,000 questions and 3,354 answers in the validation set. There are 2,000 questions and 3,308 answers in the test set.
38 PAPERS • NO BENCHMARKS YET
A large scale dataset to enable the transfer step, exploiting the Natural Questions dataset.
25 PAPERS • 1 BENCHMARK
CICERO contains 53,000 inferences for five commonsense dimensions -- cause, subsequent event, prerequisite, motivation, and emotional reaction -- collected from 5600 dialogues. It involves two challenging generative and multi-choice alternative selection tasks for the state-of-the-art NLP models to solve. Download the dataset using this link.
12 PAPERS • 4 BENCHMARKS
SelQA is a dataset that consists of questions generated through crowdsourcing and sentence length answers that are drawn from the ten most prevalent topics in the English Wikipedia.
9 PAPERS • NO BENCHMARKS YET
WikiHowQA is a Community-based Question Answering dataset, which can be used for both answer selection and abstractive summarization tasks. It contains 76,687 questions in the train set, 8,000 in the development set and 22,354 in the test set.
5 PAPERS • NO BENCHMARKS YET
A question answering dataset from the dairy domain dedicated to the study of consumer questions. The dataset contains 2,657 pairs of questions and answers, written in the Portuguese language and originally collected by the Brazilian Agricultural Research Corporation (Embrapa). All questions were motivated by real situations and written by thousands of authors with very different backgrounds and levels of literacy, while answers were elaborated by specialists from Embrapa's customer service.
2 PAPERS • NO BENCHMARKS YET
A publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering.
1 PAPER • NO BENCHMARKS YET