QUICK REVIEW

[論文レビュー] RAG based Question-Answering for Contextual Response Prediction System

Sriram Veturi, Saurabh Vaichal|arXiv (Cornell University)|Sep 5, 2024

Seismology and Earthquake Studies被引用数 5

ひとこと要約

この論文は、小売りコールセンターの知識ベースに基づく応答予測システムのエンドツーエンド Retrieval-Augmented Generation (RAG) フレームワークを提示し、BERTベースのシステムと比較して精度向上と幻覚の低減を示しています。

ABSTRACT

Large Language Models (LLMs) have shown versatility in various Natural Language Processing (NLP) tasks, including their potential as effective question-answering systems. However, to provide precise and relevant information in response to specific customer queries in industry settings, LLMs require access to a comprehensive knowledge base to avoid hallucinations. Retrieval Augmented Generation (RAG) emerges as a promising technique to address this challenge. Yet, developing an accurate question-answering framework for real-world applications using RAG entails several challenges: 1) data availability issues, 2) evaluating the quality of generated content, and 3) the costly nature of human evaluation. In this paper, we introduce an end-to-end framework that employs LLMs with RAG capabilities for industry use cases. Given a customer query, the proposed system retrieves relevant knowledge documents and leverages them, along with previous chat history, to generate response suggestions for customer service agents in the contact centers of a major retail company. Through comprehensive automated and human evaluations, we show that this solution outperforms the current BERT-based algorithms in accuracy and relevance. Our findings suggest that RAG-based LLMs can be an excellent support to human customer service representatives by lightening their workload.

研究の動機と目的

顧客サービスにおける業界用ユースケースに対して、実運用準備が整ったRAGフレームワークを実証する。
正確で現実的な根拠ある応答を得るための最適な埋め込み、取得、プロンプティングの構成を特定する。
人間と自動指標を用いて、既存のBERTベースのシステムと比較してRAGを評価する。
リアルタイム環境におけるReActおよびプロンプティング手法の実現可能性と待機時間を評価する。

提案手法

KB記事、ドメイン内Q/A、ドメイン外Q/Aを含むドメイン特化データセットを構築する。
埋め込み戦略（USE、Vertex AI、SBERT）と取得手法（ScaNN、KNN HNSW）を体系的に評価する。
PaLM2ジェネレータを用いて応答を生成し、取得したKB記事で出力を根拠付ける。
不要な取得を除外し、効率を最適化するための取得閾値を調整する。
AlignScoreと意味的類似性を含む自動指標と人間評価を用いて性能を評価する。
実運用エンドポイントを展開し、リアルタイムの提案のためにエージェントUIと統合する。

Figure 1. Example of the Response Prediction System. (A) : For a valid query, the system retrieves the relevant document and proposes the appropriate responses from where the agent choose. (B) : For an out-of-domain query, it guides the user to ask a relevant question.

実験結果

リサーチクエスチョン

RQ1RQ1：このドメインにおける埋め込み技術、取得戦略、プロンプティング手法はRAGの性能にどのような影響を与えるか？
RQ2RQ2：RAGベースの応答は、人間のエージェント支援において既存のBERTベースのシステムを上回るか？
RQ3RQ3：リアルタイム設定においてReActプロンプティングは事実性を向上させ、幻覚を減らすことができるか？

主な発見

Embedding	R@1	R@3	R@5
USE	-	-	-
SBERT	+15.36	+9.42	+8.22
Vertex AI	+21.55	+13.87	+11.85

このデータにおいて、ScaNN取得を用いたVertex AI埋め込みは他の埋め込みよりRecall@Kで上回る。
取得閾値を約0.7とすることで、取得が必要かどうかを効果的に識別し、効率を改善する。
自動評価において、RAGベースの応答はBERTベースのシステムに対して精度を10.15%向上させ、幻覚を4.76%低減した。
人間の評価者は現行モデルに比べ、RAG生成の応答を75%の頻度で好んだ。
RAGはBERTベースのシステムと比較して、AlignScoreを平均で5.6%、意味的類似性を20%向上させる。
ReActおよびチェーンプロンプティングは待機時間の実現可能な利益を限定的に提供した; CoVe/CoTPのようなプロンプティング手法はこの設定で利益を生まなかった。

Figure 2. Overview of the systems: (A) Agents respond to queries by manually searching for relevant documents, (B) The existing BERT-based system, which extracts relevant Q/A pairs from the given query and provides suggested answers to the agents, (C) The proposed RAG LLM system, where the LLM retri

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。