The Multi-domain Wizard-of-Oz (MultiWOZ) dataset is a large-scale human-human conversational corpus spanning over seven domains, containing 8438 multi-turn dialogues, with each dialogue averaging 14 turns. Different from existing standard datasets like WOZ and DSTC2, which contain less than 10 slots and only a few hundred values, MultiWOZ has 30 (domain, slot) pairs and over 4,500 possible values. The dialogues span seven domains: restaurant, hotel, attraction, taxi, train, hospital and police.
316 PAPERS • 11 BENCHMARKS
The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between a human and a virtual assistant. These conversations involve interactions with services and APIs spanning 20 domains, ranging from banks and events to media, calendar, travel, and weather. For most of these domains, the dataset contains multiple different APIs, many of which have overlapping functionalities but different interfaces, which reflects common real-world scenarios. The wide range of available annotations can be used for intent prediction, slot filling, dialogue state tracking, policy imitation learning, language generation, user simulation learning, among other tasks in large-scale virtual assistants. Besides these, the dataset has unseen domains and services in the evaluation set to quantify the performance in zero-shot or few shot settings.
168 PAPERS • 3 BENCHMARKS
MuTual is a retrieval-based dataset for multi-turn dialogue reasoning, which is modified from Chinese high school English listening comprehension test data. It tests dialogue reasoning via next utterance prediction.
53 PAPERS • NO BENCHMARKS YET
CrossWOZ is the first large-scale Chinese Cross-Domain Wizard-of-Oz task-oriented dataset. It contains 6K dialogue sessions and 102K utterances for 5 domains, including hotel, restaurant, attraction, metro, and taxi. Moreover, the corpus contains rich annotation of dialogue states and dialogue acts at both user and system sides.
25 PAPERS • NO BENCHMARKS YET
Task Oriented Parsing v2 (TOPv2) representations for intent-slot based dialog systems.
Action-Based Conversations Dataset (ABCD) is a goal-oriented dialogue fully-labeled dataset with over 10K human-to-human dialogues containing 55 distinct user intents requiring unique sequences of actions constrained by policies to achieve task success. The dataset is proposed to study customer service dialogue systems in more realistic settings.
8 PAPERS • 1 BENCHMARK
BiToD is a bilingual multi-domain dataset for end-to-end task-oriented dialogue modeling. BiToD contains over 7k multi-domain dialogues (144k utterances) with a large and realistic bilingual knowledge base. It serves as an effective benchmark for evaluating bilingual ToD systems and cross-lingual transfer learning approaches.
8 PAPERS • NO BENCHMARKS YET
The ODSQA dataset is a spoken dataset for question answering in Chinese. It contains more than three thousand questions from 20 different speakers.
3 PAPERS • NO BENCHMARKS YET
A Game Of Sorts is a collaborative image ranking task. Players are asked to rank a set of images based on a given sorting criterion. The game provides a framework for the evaluation of visually grounded language understanding and generation of referring expressions in multimodal dialogue settings.
2 PAPERS • NO BENCHMARKS YET
SSD (Sub-slot Dialog) dataset: This is the dataset for the ACL 2022 paper "A Slot Is Not Built in One Utterance: Spoken Language Dialogs with Sub-Slots". arxiv
SSD (Sub-slot Dialog) dataset: This is the dataset for the ACL 2022 paper "A Slot Is Not Built in One Utterance: Spoken Language Dialogs with Sub-Slots".
A novel dataset of document-grounded task-based dialogues, where an Information Giver (IG) provides instructions (by consulting a document) to an Information Follower (IF), so that the latter can successfully complete the task. In this unique setting, the IF can ask clarification questions which may not be grounded in the underlying document and require commonsense knowledge to be answered.
The Arabic-TOD dataset is based on the BiToD dataset. Of the 3,689 BiToD-English dialogues, 1,500 dialogues (30,000 utterances) were translated into Arabic. We translated the task-related keywords such as cuisine, dietary restrictions, and price-level for the restaurant domain, price-level for the hotel domain, type, and price-level for the attraction domain, day, weather, and city for the weather domain. We keep the rest of values without translation, like hotels’ and restaurants’ names, locations, and addresses. These values are real entities in Hong Kong city (literals), and most of them contain Chinese words written in English, therefore they have not been translated. According to the slot-values in the Arabic-TOD dataset, we used the slots names as they are in English and translated their corresponding values, except the entities in Hong Kong city since the Arabic-TOD dataset supports codeswitching.
1 PAPER • NO BENCHMARKS YET
HR-Multiwoz is a fully-labeled dataset of 550 conversations spanning 10 HR domains to evaluate LLM Agent. It is the first labeled open-sourced conversation dataset in the HR domain for NLP research. Please refer to HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent for details about the dataset construction.
Simulates unanticipated user needs in the deployment stage.
MultiWOZ-coref, (or MultiWOZ 2.3) is an extension of the MultiWOZ dataset that adds co-reference annotations in addition to corrections of dialogue acts and dialogue states.
1 PAPER • 1 BENCHMARK