QUICK REVIEW

[논문 리뷰] RAG based Question-Answering for Contextual Response Prediction System

Sriram Veturi, Saurabh Vaichal|arXiv (Cornell University)|2024. 09. 05.

Seismology and Earthquake Studies인용 수 5

한 줄 요약

이 논문은 소매 고객센터의 지식 기반 응답 예측 시스템을 위한 엔드-투-엔드 Retrieval-Augmented Generation(RAG) 프레임워크를 제시하고, BERT 기반 시스템 대비 정확도 향상과 환각 감소를 보인다.

ABSTRACT

Large Language Models (LLMs) have shown versatility in various Natural Language Processing (NLP) tasks, including their potential as effective question-answering systems. However, to provide precise and relevant information in response to specific customer queries in industry settings, LLMs require access to a comprehensive knowledge base to avoid hallucinations. Retrieval Augmented Generation (RAG) emerges as a promising technique to address this challenge. Yet, developing an accurate question-answering framework for real-world applications using RAG entails several challenges: 1) data availability issues, 2) evaluating the quality of generated content, and 3) the costly nature of human evaluation. In this paper, we introduce an end-to-end framework that employs LLMs with RAG capabilities for industry use cases. Given a customer query, the proposed system retrieves relevant knowledge documents and leverages them, along with previous chat history, to generate response suggestions for customer service agents in the contact centers of a major retail company. Through comprehensive automated and human evaluations, we show that this solution outperforms the current BERT-based algorithms in accuracy and relevance. Our findings suggest that RAG-based LLMs can be an excellent support to human customer service representatives by lightening their workload.

연구 동기 및 목표

산업용 고객 서비스 사례를 위한 생산 준비된 RAG 프레임워크를 시연한다.
정확하고 근거 있는 응답을 위한 최적의 임베딩, 검색 및 프롬프트 구성을 식별한다.
사람과 자동 지표를 사용해 RAG를 기존 BERT 기반 시스템과 비교 평가한다.
실시간 환경에서 ReAct 및 프롬프트 기법의 실행 가능성과 지연 시간을 평가한다.

제안 방법

KB 문서, 도메인 내 Q/A, 도메인 외 Q/A를 포함하는 도메인 특화 데이터셋을 구성한다.
임베딩 전략(USE, Vertex AI, SBERT)과 검색 방법(ScaNN, KNN HNSW)을 체계적으로 평가한다.
응답 생성을 위해 PaLM2 생성기를 사용하고 검색된 KB 문서로 출력을 근거화한다.
필요하지 않은 검색을 필터링하고 효율성을 최적화하기 위한 검색 임계치를 조정한다.
AlignScore 및 의미적 유사성 등 자동 지표와 사람 평가를 통한 성능을 평가한다.
생산 엔드포인트를 배포하고 실시간 제안용 에이전트 UI와 통합한다.

Figure 1. Example of the Response Prediction System. (A) : For a valid query, the system retrieves the relevant document and proposes the appropriate responses from where the agent choose. (B) : For an out-of-domain query, it guides the user to ask a relevant question.

실험 결과

연구 질문

RQ1RQ1: 임베딩 기법, 검색 전략 및 프롬프트 방식이 이 도메인의 RAG 성능에 어떤 영향을 미치는가?
RQ2RQ2: RAG 기반 응답이 인간 에이전트를 돕는 데 있어 기존의 BERT 기반 시스템보다 우수한가?
RQ3RQ3: ReAct 프롬프트가 실시간 환경에서 사실 확인 정확도와 환각 감소에 기여하는가?

주요 결과

Embedding	R@1	R@3	R@5
USE	-	-	-
SBERT	+15.36	+9.42	+8.22
Vertex AI	+21.55	+13.87	+11.85

Vertex AI 임베딩과 ScaNN 검색이 이 데이터의 Recall@K에서 다른 임베딩보다 우수하다.
검색 임계치가 대략 0.7일 때 검색 필요 여부를 효과적으로 구분하여 효율성을 개선한다.
자동 평가에서 RAG 기반 응답은 BERT 기반 시스템에 비해 정확도가 10.15% 상승하고 환각이 4.76% 감소한다.
인간 평가자들은 기존 모델 대비 RAG 생성 응답을 75%의 비율로 선호했다.
RAG는 BERT 기반 시스템과 비교해 AlignScore를 평균 5.6% 증가시키고 의미적 유사성을 20% 향상시킨다.
ReAct 및 체인 프롬프트는 지연 시간 측면에서 제한된 이득을 제공했으며, CoVe/CoTP와 같은 프롬프트 기법은 이 설정에서 이점을 주지 않았다.

Figure 2. Overview of the systems: (A) Agents respond to queries by manually searching for relevant documents, (B) The existing BERT-based system, which extracts relevant Q/A pairs from the given query and provides suggested answers to the agents, (C) The proposed RAG LLM system, where the LLM retri

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.