Skip to main content
QUICK REVIEW

[论文解读] RAG based Question-Answering for Contextual Response Prediction System

Sriram Veturi, Saurabh Vaichal|arXiv (Cornell University)|Sep 5, 2024
Seismology and Earthquake Studies被引用 5
一句话总结

该论文提出一个端到端的检索增强生成(RAG)框架,用于零售客户联系中心的知识基础回应预测系统,与基于BERT的系统相比,提升了准确性并减少了幻觉。

ABSTRACT

Large Language Models (LLMs) have shown versatility in various Natural Language Processing (NLP) tasks, including their potential as effective question-answering systems. However, to provide precise and relevant information in response to specific customer queries in industry settings, LLMs require access to a comprehensive knowledge base to avoid hallucinations. Retrieval Augmented Generation (RAG) emerges as a promising technique to address this challenge. Yet, developing an accurate question-answering framework for real-world applications using RAG entails several challenges: 1) data availability issues, 2) evaluating the quality of generated content, and 3) the costly nature of human evaluation. In this paper, we introduce an end-to-end framework that employs LLMs with RAG capabilities for industry use cases. Given a customer query, the proposed system retrieves relevant knowledge documents and leverages them, along with previous chat history, to generate response suggestions for customer service agents in the contact centers of a major retail company. Through comprehensive automated and human evaluations, we show that this solution outperforms the current BERT-based algorithms in accuracy and relevance. Our findings suggest that RAG-based LLMs can be an excellent support to human customer service representatives by lightening their workload.

研究动机与目标

  • 在客户服务的行业场景中演示一个可投入生产的RAG框架。
  • 识别用于准确、可核验响应的最佳嵌入、检索和提示配置。
  • 使用人工和自动化指标对比评估RAG与现有基于BERT的系统。
  • 评估ReAct和提示技术在实时环境中的可行性与延迟。

提出的方法

  • 构建一个领域特定数据集,包含KB文章、领域内问答和领域外问答。
  • 系统性评估嵌入策略(USE、Vertex AI、SBERT)和检索方法(ScaNN、KNN HNSW)。
  • 使用PaLM2生成器进行响应生成,并用检索到的KB文章对输出进行校验。
  • 调整检索阈值以过滤不必要的检索并优化效率。
  • 采用自动化指标和人工评估来评估性能,包括AlignScore和语义相似度。
  • 部署生产端点并与代理UI集成以实现实时建议。
Figure 1. Example of the Response Prediction System. (A) : For a valid query, the system retrieves the relevant document and proposes the appropriate responses from where the agent choose. (B) : For an out-of-domain query, it guides the user to ask a relevant question.
Figure 1. Example of the Response Prediction System. (A) : For a valid query, the system retrieves the relevant document and proposes the appropriate responses from where the agent choose. (B) : For an out-of-domain query, it guides the user to ask a relevant question.

实验结果

研究问题

  • RQ1RQ1:在该领域中,嵌入技术、检索策略和提示方法如何影响RAG的性能?
  • RQ2RQ2:基于RAG的响应是否在帮助人工代理方面优于现有的基于BERT的系统?
  • RQ3RQ3:ReAct提示能否在实时环境中提升事实准确性并减少幻觉?

主要发现

嵌入R@1R@3R@5
USE - - -
SBERT+15.36+9.42+8.22
Vertex AI+21.55+13.87+11.85
  • 本数据中,Vertex AI嵌入与ScaNN检索在Recall@K方面优于其他嵌入。
  • 检索阈值约为0.7时能够有效区分何时需要检索,从而提高效率。
  • 相对于基于BERT的系统,RAG生成的回答在自动评估中将准确性提高了10.15%,幻觉减少了4.76%。
  • 人工评估者在75%的情形下更偏好RAG生成的回答。
  • 与BERT基系统相比,RAG在AlignScore平均提高5.6%,语义相似度提高20%。
  • ReAct和链式提示在延迟可接受性方面收益有限;在本场景中,像CoVe/CoTP这样的提示技术未带来收益。
Figure 2. Overview of the systems: (A) Agents respond to queries by manually searching for relevant documents, (B) The existing BERT-based system, which extracts relevant Q/A pairs from the given query and provides suggested answers to the agents, (C) The proposed RAG LLM system, where the LLM retri
Figure 2. Overview of the systems: (A) Agents respond to queries by manually searching for relevant documents, (B) The existing BERT-based system, which extracts relevant Q/A pairs from the given query and provides suggested answers to the agents, (C) The proposed RAG LLM system, where the LLM retri

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。