QUICK REVIEW

[论文解读] ConvAI3: Generating Clarifying Questions for Open-Domain Dialogue Systems (ClariQ)

Mohammad Aliannejadi, Julia Kiseleva|arXiv (Cornell University)|Sep 23, 2020

Topic Modeling参考文献 9被引用 32

一句话总结

本文描述了 ClariQ 挑战，用于在开放领域对话中生成和评估澄清性问题，阶段1使用静态数据，阶段2进行人工干预评估，以研究何时提出澄清以及如何生成它们。

ABSTRACT

This document presents a detailed description of the challenge on clarifying questions for dialogue systems (ClariQ). The challenge is organized as part of the Conversational AI challenge series (ConvAI3) at Search Oriented Conversational AI (SCAI) EMNLP workshop in 2020. The main aim of the conversational systems is to return an appropriate answer in response to the user requests. However, some user requests might be ambiguous. In IR settings such a situation is handled mainly thought the diversification of the search result page. It is however much more challenging in dialogue settings with limited bandwidth. Therefore, in this challenge, we provide a common evaluation framework to evaluate mixed-initiative conversations. Participants are asked to rank clarifying questions in an information-seeking conversations. The challenge is organized in two stages where in Stage 1 we evaluate the submissions in an offline setting and single-turn conversations. Top participants of Stage 1 get the chance to have their model tested by human annotators.

研究动机与目标

在开放领域对话中激发对澄清性问题的需求，以应对用户请求的歧义。
提出一个两阶段的 ClariQ 挑战，研究何时请求澄清以及如何生成澄清性问题。
提供数据集、评估指标，以及一个人机环路协议，以基准化方法。

提出的方法

介绍阶段1，包含由初始用户请求、澄清问题选项和用户回答组成的静态数据集；提供训练/验证/测试划分。
定义两个任务：(a) 估计一个 1–4 的分数，表示是否需要提出澄清性问题；(b) 从一个银行中选择最合适的澄清性问题。
使用检索式评估指标（MRR、P@k、nDCG）来评估澄清对下游文档检索的影响。
阶段2 让真实用户参与（人机环路），评估必须决定何时澄清以及如何回应的对话系统，倾向于使用生成模型进行对话管理。
提供一个包含数据和评估脚本的公开仓库。

实验结果

研究问题

RQ1RQ1：在对话中何时提出澄清性问题？
RQ2RQ2：如何生成澄清性问题？

主要发现

阶段1 提供包含用户请求、澄清性问题和答案的数据集，以及70/30 的训练/验证划分和带问答库的测试设置。
评估显示通过检索指标（MRR、P@1/3/5/10/20、nDCG@1/3/5/20）以及 Recall@10/20/30 来衡量问题相关性的表现。
阶段2 突出转向生成建模和真实用户评估，以在提问澄清与提供相关答案之间取得平衡。
人机环路评估收集对话历史、系统响应，以及用户对相关性和自然度的评分。
GitHub 上的仓库托管数据和自动评估脚本。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。