🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

42 dataset results for Semantic Parsing

FrameNet is a linguistic knowledge graph containing information about lexical and predicate argument semantics of the English language. FrameNet contains two distinct entity classes: frames and lexical units, where a frame is a meaning and a lexical unit is a single meaning for a word.

434 PAPERS • NO BENCHMARKS YET

ATIS (Airline Travel Information Systems)

The ATIS (Airline Travel Information Systems) is a dataset consisting of audio recordings and corresponding manual transcripts about humans asking for flight information on automated airline travel inquiry systems. The data consists of 17 unique intent categories. The original split contains 4478, 500 and 893 intent-labeled reference utterances in train, development and test set respectively.

263 PAPERS • 7 BENCHMARKS

WikiSQL

WikiSQL consists of a corpus of 87,726 hand-annotated SQL query and natural language question pairs. These SQL queries are further split into training (61,297 examples), development (9,145 examples) and test sets (17,284 examples). It can be used for natural language inference tasks related to relational databases.

227 PAPERS • 3 BENCHMARKS

SCAN (Simplified versions of the CommAI Navigation tasks)

SCAN is a dataset for grounded navigation which consists of a set of simple compositional navigation commands paired with the corresponding action sequences.

136 PAPERS • NO BENCHMARKS YET

Spider-Realistic

Spider dataset is used for evaluation in the paper "Structure-Grounded Pretraining for Text-to-SQL". The dataset is created based on the dev split of the Spider dataset (2020-06-07 version from https://yale-lily.github.io/spider). We manually modified the original questions to remove the explicit mention of column names while keeping the SQL queries unchanged to better evaluate the model's capability in aligning the NL utterance and the DB schema. For more details, please check our paper at https://arxiv.org/abs/2010.12773.

66 PAPERS • 2 BENCHMARKS

WikiTableQuestions

WikiTableQuestions is a question answering dataset over semi-structured tables. It is comprised of question-answer pairs on HTML tables, and was constructed by selecting data tables from Wikipedia that contained at least 8 rows and 5 columns. Amazon Mechanical Turk workers were then tasked with writing trivia questions about each table. WikiTableQuestions contains 22,033 questions. The questions were not designed by predefined templates but were hand crafted by users, demonstrating high linguistic variance. Compared to previous datasets on knowledge bases it covers nearly 4,000 unique column headers, containing far more relations than closed domain datasets and datasets for querying knowledge bases. Its questions cover a wide range of domains, requiring operations such as table lookup, aggregation, superlatives (argmax, argmin), arithmetic operations, joins and unions.

62 PAPERS • 1 BENCHMARK

CFQ

CFQ (Compositional Freebase Questions)

A large and realistic natural language question answering dataset.

61 PAPERS • 1 BENCHMARK

Occluded REID

Occluded REID is an occluded person dataset captured by mobile cameras, consisting of 2,000 images of 200 occluded persons (see Fig. (c)). Each identity has 5 full-body person images and 5 occluded person images with different types of occlusion.

59 PAPERS • 1 BENCHMARK

WebQuestionsSP

WebQuestionsSP (WebQuestions Semantic Parses Dataset)

The WebQuestionsSP dataset is released as part of our ACL-2016 paper “The Value of Semantic Parse Labeling for Knowledge Base Question Answering” [Yih, Richardson, Meek, Chang & Suh, 2016], in which we evaluated the value of gathering semantic parses, vs. answers, for a set of questions that originally comes from WebQuestions [Berant et al., 2013]. The WebQuestionsSP dataset contains full semantic parses in SPARQL queries for 4,737 questions, and “partial” annotations for the remaining 1,073 questions for which a valid parse could not be formulated or where the question itself is bad or needs a descriptive answer. This release also includes an evaluation script and the output of the STAGG semantic parsing system when trained using the full semantic parses. More detail can be found in the document and labeling instructions included in this release, as well as the paper.

56 PAPERS • 5 BENCHMARKS

ComplexWebQuestions

ComplexWebQuestions is a dataset for answering complex questions that require reasoning over multiple web snippets. It contains a large set of complex questions in natural language, and can be used in multiple ways:

55 PAPERS • 2 BENCHMARKS

NomBank

NomBank is an annotation project at New York University that is related to the PropBank project at the University of Colorado. The goal is to mark the sets of arguments that cooccur with nouns in the PropBank Corpus (the Wall Street Journal Corpus of the Penn Treebank), just as PropBank records such information for verbs. As a side effect of the annotation process, the authors are producing a number of other resources including various dictionaries, as well as PropBank style lexical entries called frame files. These resources help the user label the various arguments and adjuncts of the head nouns with roles (sets of argument labels for each sense of each noun). NYU and U of Colorado are making a coordinated effort to insure that, when possible, role definitions are consistent across parts of speech. For example, PropBank's frame file for the verb "decide" was used in the annotation of the noun "decision".

49 PAPERS • NO BENCHMARKS YET

SParC (Semantic Parsing in Context)

SParC is a large-scale dataset for complex, cross-domain, and context-dependent (multi-turn) semantic parsing and text-to-SQL task (interactive natural language interfaces for relational databases).

48 PAPERS • 2 BENCHMARKS

CONCODE

A new large dataset with over 100,000 examples consisting of Java classes from online code repositories, and develop a new encoder-decoder architecture that models the interaction between the method documentation and the class environment.

35 PAPERS • 1 BENCHMARK

CSQA

Contains around 200K dialogs with a total of 1.6M turns. Further, unlike existing large scale QA datasets which contain simple questions that can be answered from a single tuple, the questions in the dialogs require a larger subgraph of the KG.

35 PAPERS • NO BENCHMARKS YET

CoSQL (Conversational Text-to-SQL Challenge)

CoSQL is a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions.

33 PAPERS • 1 BENCHMARK

SQA (SequentialQA)

The SQA dataset was created to explore the task of answering sequences of inter-related questions on HTML tables. It has 6,066 sequences with 17,553 questions in total.

30 PAPERS • 1 BENCHMARK

TOPv2

TOPv2 (Task Oriented Parsing v2)

Task Oriented Parsing v2 (TOPv2) representations for intent-slot based dialog systems.

25 PAPERS • NO BENCHMARKS YET

AMR Bank (Abstract Meaning Representation)

The AMR Bank is a set of English sentences paired with simple, readable semantic representations. Version 3.0 released in 2020 consists of 59,255 sentences.

22 PAPERS • 1 BENCHMARK

Hearthstone

This dataset contains card descriptions of the card game Hearthstone and the code that implements them. These are obtained from the open-source implementation Hearthbreaker (https://github.com/danielyule/hearthbreaker).

21 PAPERS • NO BENCHMARKS YET

WebChild

One of the largest commonsense knowledge bases available, describing over 2 million disambiguated concepts and activities, connected by over 18 million assertions.

20 PAPERS • NO BENCHMARKS YET

QuaRel

QuaRel is a crowdsourced dataset of 2771 multiple-choice story questions, including their logical forms.

19 PAPERS • NO BENCHMARKS YET

Groningen Meaning Bank

Groningen Meaning Bank is a semantic resource that anyone can edit and that integrates various semantic phenomena, including predicate-argument structure, scope, tense, thematic roles, animacy, pronouns, and rhetorical relations.

18 PAPERS • NO BENCHMARKS YET

Geometry3K

A new large-scale geometry problem-solving dataset - 3,002 multi-choice geometry problems - dense annotations in formal language for the diagrams and text - 27,213 annotated diagram logic forms (literals) - 6,293 annotated text logic forms (literals)

17 PAPERS • 1 BENCHMARK

KQA Pro

A large-scale dataset for Complex KBQA.

17 PAPERS • 1 BENCHMARK

GraphQuestions

GraphQuestions is a characteristic-rich dataset designed for factoid question answering. The dataset aims to provide a systematic way of constructing QA datasets with rich and explicitly specified question characteristics. Here are some key details about GraphQuestions:

13 PAPERS • 2 BENCHMARKS

Fashion 144K

Fashion 144K is a novel heterogeneous dataset with 144,169 user posts containing diverse image, textual and meta information.

11 PAPERS • NO BENCHMARKS YET

ComQA

ComQA is a large dataset of real user questions that exhibit different challenging aspects such as compositionality, temporal reasoning, and comparisons. ComQA questions come from the WikiAnswers community QA platform, which typically contains questions that are not satisfactorily answerable by existing search engine technology.

9 PAPERS • NO BENCHMARKS YET

SEDE (Stack Exchange Data Explorer)

SEDE is a dataset comprised of 12,023 complex and diverse SQL queries and their natural language titles and descriptions, written by real users of the Stack Exchange Data Explorer out of a natural interaction. These pairs contain a variety of real-world challenges which were rarely reflected so far in any other semantic parsing dataset. The goal of this dataset is to take a significant step towards evaluation of Text-to-SQL models in a real-world setting. Compared to other Text-to-SQL datasets, SEDE contains at least 10 times more SQL queries templates (queries after canonization and anonymization of values) than other datasets, and has the most diverse set of utterances and SQL queries (in terms of 3-grams) out of all single-domain datasets. SEDE introduces real-world challenges, such as under-specification, usage of parameters in queries, dates manipulation and more.

8 PAPERS • 1 BENCHMARK

SPLASH

A dataset of utterances, incorrect SQL interpretations and the corresponding natural language feedback.

6 PAPERS • NO BENCHMARKS YET

FollowUp

1000 query triples on 120 tables.

5 PAPERS • NO BENCHMARKS YET

Szeged Corpus

The Szeged Treebank is the largest fully manually annotated treebank of the Hungarian language. It contains 82,000 sentences, 1.2 million words and 250,000 punctuation marks. Texts were selected from six different domains, ~200,000 words in size from each. The domains are the following:

4 PAPERS • NO BENCHMARKS YET

aethel

aethel (Automatically Extracted Theorems from Lassy)

A dataset of approximately 75,000 phrases and sentences, syntactically analyzed as typelogical derivations (i.e. proofs of modal intuitionistic linear logic, or programs of the corresponding λ calculus). Analyses were obtained by transforming the dependency graphs of the Lassy-Small corpus.

4 PAPERS • NO BENCHMARKS YET

Multilingual TOP

Multilingual TOP is a dataset for multilingual semantic parsing with human-written sentences as opposed to machine translated ones. The dataset sentences are in English, Italian and Japanese and it is based on the Facebook Task Oriented Parsing (TOP) dataset.

3 PAPERS • NO BENCHMARKS YET

MIMIC-SPARQL

Question Answering (QA) is a widely-used framework for developing and evaluating an intelligent machine. In this light, QA on Electronic Health Records (EHR), namely EHR QA, can work as a crucial milestone toward developing an intelligent agent in healthcare. EHR data are typically stored in a relational database, which can also be converted to a directed acyclic graph, allowing two approaches for EHR QA: Table-based QA and Knowledge Graph-based QA.

2 PAPERS • NO BENCHMARKS YET

PCFG SET (Probabilistic Context Free Grammar String Edit Task)

The Probabilistic Context Free Grammar String Edit Task (PCFG SET) dataset is a dataset with sequence to sequence problems specifically designed to test different aspects of compositional generalisation. In particular, the dataset contains splits to test for systematicity, productivity, substitutivity, localism and overgeneralisation.

2 PAPERS • NO BENCHMARKS YET

Spades (Semantic PArsing of DEclarative Sentences)

Datasets Spades contains 93,319 questions derived from clueweb09 sentences. Specifically, the questions were created by randomly removing an entity, thus producing sentence-denotation pairs.

2 PAPERS • NO BENCHMARKS YET

Stanford Schema2QA Dataset

Schema2QA is the first large question answering dataset over real-world Schema.org data. It covers 6 common domains: restaurants, hotels, people, movies, books, and music, based on crawled Schema.org metadata from 6 different websites (Yelp, Hyatt, LinkedIn, IMDb, Goodreads, and last.fm.). In total, there are over 2,000,000 examples for training, consisting of both augmented human paraphrase data and high-quality synthetic data generated by Genie. All questions are annotated with executable virtual assistant programming language ThingTalk.

2 PAPERS • NO BENCHMARKS YET

ViText2SQL

ViText2SQL is a dataset for the Vietnamese Text-to-SQL semantic parsing task, consisting of about 10K question and SQL query pairs.

2 PAPERS • NO BENCHMARKS YET

Conic10K

Conic10K is an open-ended math problem dataset on conic sections in Chinese senior high school education. This dataset contains 10,861 carefully annotated problems, each one has a formal representation, the corresponding text spans, the answer, and natural language rationales. These questions require long reasoning steps while the topic is limited to conic sections. It could be used to evaluate models with 2 tasks: semantic parsing and mathematical question answering (mathQA).

1 PAPER • NO BENCHMARKS YET

Hinglish-TOP

Hinglish-TOP is a human annotated code-switched semantic parsing dataset containing 10k human annotations for Hindi-English (HINGLISH) code switched utterances, and over 170K CST5 generated code-switched utterances from the TOPv2 dataset.

1 PAPER • NO BENCHMARKS YET

SimpleQuestionsWikiData

SimpleQuestionsWikidata maps SimpleQuestions to Wikidata.

1 PAPER • NO BENCHMARKS YET

TurkQA

TurkQA consists of a selection of sentences from English Wikipedia articles, with questions and answers crowdsourced from workers on Amazon Mechanical Turk.

1 PAPER • NO BENCHMARKS YET