QUICK REVIEW

[论文解读] Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering

Fengbin Zhu, Wenqiang Lei|arXiv (Cornell University)|Jan 4, 2021

Topic Modeling参考文献 160被引用 153

一句话总结

本综述分析开放域问答，聚焦检索-阅读器（Retriever-Reader）架构，回顾检索方法（稀疏、密集、迭代），并讨论神经MRC、挑战与基准。

ABSTRACT

Open-domain Question Answering (OpenQA) is an important task in Natural Language Processing (NLP), which aims to answer a question in the form of natural language based on large-scale unstructured documents. Recently, there has been a surge in the amount of research literature on OpenQA, particularly on techniques that integrate with neural Machine Reading Comprehension (MRC). While these research works have advanced performance to new heights on benchmark datasets, they have been rarely covered in existing surveys on QA systems. In this work, we review the latest research trends in OpenQA, with particular attention to systems that incorporate neural MRC techniques. Specifically, we begin with revisiting the origin and development of OpenQA systems. We then introduce modern OpenQA architecture named "Retriever-Reader" and analyze the various systems that follow this architecture as well as the specific techniques adopted in each of the components. We then discuss key challenges to developing OpenQA systems and offer an analysis of benchmarks that are commonly used. We hope our work would enable researchers to be informed of the recent advancement and also the open challenges in OpenQA research, so as to stimulate further progress in this field.

研究动机与目标

追踪开放问答系统从传统方法到神经方法的起源与发展。
介绍并分析 Retriever-Reader 架构及其组件。
综述稀疏、密集和迭代检索器及它们在OpenQA中的作用。
讨论开放问答中的关键挑战并概述常用基准数据集。

提出的方法

回顾 OpenQA 从传统管线到现代神经端到端系统的演变。
提出 Retriever-Reader OpenQA 系统的分类法并分析组件技术。
将检索器分为稀疏、密集和迭代，并描述它们的机制与权衡。
讨论用于答案提取的端到端训练范式和神经 MRC 模型。
概述挑战与基准，以指导未来的 OpenQA 研究。

实验结果

研究问题

RQ1开放域问答的历史演变是什么，以及神经 MRC 方法如何塑造现代系统？
RQ2Retriever-Reader 架构如何工作，各组件的主要变体与技术是什么？
RQ3在 OpenQA 中稀疏、密集和迭代检索器的相对优劣和局限性是什么？
RQ4评估 OpenQA 系统时常用的关键挑战与基准数据集是什么？

主要发现

OpenQA 系统通常分为文本问答和知识库问答，OpenQA 旨在从非结构化文本中回答问题。
现代主流架构是 Retriever-Reader，通常通过文档/答案后处理和端到端训练来增强。
检索器分为稀疏、密集和迭代，每种在文档检索方面具有不同的机制和权衡。
神经 MRC 模型已成为答案提取的核心，使端到端训练和与检索器的整合成为可能。
密集检索器通过潜在表示来解决术语不匹配，而迭代检索器实现对复杂问题的多跳检索。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。