QUICK REVIEW

[论文解读] Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models

Ashwin K. Vijayakumar, Michael Cogswell|arXiv (Cornell University)|Oct 7, 2016

Multimodal Machine Learning Applications参考文献 18被引用 358

一句话总结

Diverse Beam Search (DBS) 使用多样性增强目标来解码一组多样化的输出，相较于标准束搜索在开销极小的情况下提升了 top-1 解，适用于多种任务。

ABSTRACT

Neural sequence models are widely used to model time-series data. Equally ubiquitous is the usage of beam search (BS) as an approximate inference algorithm to decode output sequences from these models. BS explores the search space in a greedy left-right fashion retaining only the top-B candidates - resulting in sequences that differ only slightly from each other. Producing lists of nearly identical sequences is not only computationally wasteful but also typically fails to capture the inherent ambiguity of complex AI tasks. To overcome this problem, we propose Diverse Beam Search (DBS), an alternative to BS that decodes a list of diverse outputs by optimizing for a diversity-augmented objective. We observe that our method finds better top-1 solutions by controlling for the exploration and exploitation of the search space - implying that DBS is a better search algorithm. Moreover, these gains are achieved with minimal computational or memory over- head as compared to beam search. To demonstrate the broad applicability of our method, we present results on image captioning, machine translation and visual question generation using both standard quantitative metrics and qualitative human studies. Further, we study the role of diversity for image-grounded language generation tasks as the complexity of the image changes. We observe that our method consistently outperforms BS and previously proposed techniques for diverse decoding from neural sequence models.

研究动机与目标

激发在神经序列解码中捕捉输出多样性以超越传统束搜索的需求。
将 Diverse Beam Search (DBS) 作为一种多样性增强解码方法引入。
证明 DBS 在保持与 BS 相近的计算成本的同时提升 top-1 绩效，在各任务中。

提出的方法

提出一种多样性增强目标，在解码过程中鼓励多样化的候选序列。
保持束搜索风格的过程，但选择多个多样化的假设，而非近重复的输出。
展示在图像字幕、机器翻译和视觉问答生成中的适用性。
使用标准定量指标和定性的人类评估研究进行评价。
分析在图像复杂度变化时，多样性如何影响语言生成。

实验结果

研究问题

RQ1多样性增强解码是否能产生比标准束搜索更具多样性且潜在更好的 top-1 输出？
RQ2在改进结果的同时，DBS 是否维持与束搜索相似的计算和内存开销？
RQ3输出多样性如何影响图像字幕、机器翻译和视觉问答生成等任务的性能？
RQ4图像复杂度对多样性解码有用性的影响是什么？

主要发现

DBS 在各任务中持续优于标准束搜索和以往的多样化解码方法。
DBS 提供多样化的解题假设，附加的计算或内存开销很小。
DBS 提升了图像字幕、机器翻译和视觉问答生成中的 top-1 质量。
随着图像复杂度增加，多样性在语言生成中发挥作用，DBS 能有效地管理这一点。
结果得到定量指标和定性的人类研究的支持。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。