For goal-oriented document-grounded dialogs, it often involves complex contexts for identifying the most relevant information, which requires better understanding of the inter-relations between conversations and documents. Meanwhile, many online user-oriented documents use both semi-structured and unstructured contents for guiding users to access information of different contexts. Thus, we create a new goal-oriented document-grounded dialogue dataset that captures more diverse scenarios derived from various document contents from multiple domains such ssa.gov and studentaid.gov. For data collection, we propose a novel pipeline approach for dialogue data construction, which has been adapted and evaluated for several domains.
34 PAPERS • NO BENCHMARKS YET
A machine reading comprehension (MRC) dataset with discourse structure built over multiparty dialog. Molweni's source samples from the Ubuntu Chat Corpus, including 10,000 dialogs comprising 88,303 utterances.
25 PAPERS • 2 BENCHMARKS
Most existing dialogue systems fail to respond properly to potentially unsafe user utterances by either ignoring or passively agreeing with them.
13 PAPERS • 1 BENCHMARK
In MutualFriends, two agents, A and B, each have a private knowledge base, which contains a list of friends with multiple attributes (e.g., name, school, major, etc.). The agents must chat with each other to find their unique mutual friend.
7 PAPERS • NO BENCHMARKS YET
DiaASQ is a fine-grained Aspect-based Sentiment Analysis (ABSA) benchmark under the conversation scenario. It challenges existing ABSA methods by 1) extracting quadruple of target-aspect-opinion-sentiment in a dialogue, and 2) modeling the dialogue discourse structures. The dataset is constructed by systematically crawling tweets from digital bloggers, followed by a series of preprocessing steps including filtering, normalizing, pruning, and annotating the collected dialogues, resulting in a final corpus of 1,000 dialogues. To enhance the multilingual usability, DiaASQ has both the English and Chinese versions of languages.
3 PAPERS • 2 BENCHMARKS
Emotional Dialogue Acts data contains dialogue act labels for existing emotion multi-modal conversational datasets. We chose two popular multimodal emotion datasets: Multimodal EmotionLines Dataset (MELD) and Interactive Emotional dyadic MOtion CAPture database (IEMOCAP). EDAs reveal associations between dialogue acts and emotional states in a natural-conversational language such as Accept/Agree dialogue acts often occur with the Joy emotion, Apology with Sadness, and Thanking with Joy.
3 PAPERS • NO BENCHMARKS YET
WDC-Dialogue is a dataset built from the Chinese social media to train EVA. Specifically, conversations from various sources are gathered and a rigorous data cleaning pipeline is designed to enforce the quality of WDC-Dialogue.
Harry Potter Dialogue is the first dialogue dataset that integrates with scene, attributes and relations which are dynamically changed as the storyline goes on. Our work can facilitate research to construct more human-like conversational systems in practice. For example, virtual assistant, NPC in games, etc. Moreover, HPD can both support dialogue generation and retrieval tasks.
1 PAPER • 2 BENCHMARKS
The Mafia Dataset was created to model the behavior of deceptive actors in the context of the Mafia game, as described in the paper “Putting the Con in Context: Identifying Deceptive Actors in the Game of Mafia”. We hope that this dataset will be of use to others studying the effects of deception on language use.
1 PAPER • NO BENCHMARKS YET