Skip to main content
QUICK REVIEW

[논문 리뷰] FACTS About Building Retrieval Augmented Generation-based Chatbots

Rama Akkiraju, Anbang Xu|arXiv (Cornell University)|2024. 07. 10.
AI in Service Interactions인용 수 9
한 줄 요약

이 논문은 엔터프라이즈급 RAG 기반 챗봇을 위한 FACTS 프레임워크를 제시하며, RAG 파이프라인 전반의 15개 제어 포인트, 유연한 NVBot 플랫폼 아키텍처, 엔터프라이즈 맥락에서 대형 및 소형 LLM 간의 실증적 트레이드오프를 상세히 다룹니다.

ABSTRACT

Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This includes fine-tuning embeddings and LLMs, extracting documents from vector databases, rephrasing queries, reranking results, designing prompts, honoring document access controls, providing concise responses, including references, safeguarding personal information, and building orchestration agents. We present a framework for building RAG-based chatbots based on our experience with three NVIDIA chatbots: for IT/HR benefits, financial earnings, and general content. Our contributions are three-fold: introducing the FACTS framework (Freshness, Architectures, Cost, Testing, Security), presenting fifteen RAG pipeline control points, and providing empirical results on accuracy-latency tradeoffs between large and small LLMs. To the best of our knowledge, this is the first paper of its kind that provides a holistic view of the factors as well as solutions for building secure enterprise-grade chatbots."

연구 동기 및 목표

  • Define the FACTS framework (Freshness, Architecture, Cost, Testing, Security) for enterprise RAG chatbots.
  • Identify and describe 15 control points in RAG pipelines and provide optimization strategies for each point.
  • Propose a flexible, multi-bot NVBot platform to support domain-specific, enterprise-wide, and copilots architectures.
  • Evaluate the cost-performance trade-offs between large and small LLMs in enterprise chatbot deployments.
  • Illustrate practical experiences from building three NVIDIA chatbots (NVInfo, NVHelp, Scout) to validate the framework.

제안 방법

  • Describe the five FACTS dimensions and argue their importance for enterprise RAG chatbot success.
  • Enumerate and explain 15 control points of RAG pipelines (RAG-C and RAGOps) with corresponding remediation strategies.
  • Present an architecture for a modular, pluggable NVBot platform enabling selection of LLMs, vector databases, embeddings, and agents.
  • Provide empirical comparisons of accuracy and latency across large vs. small LLMs in enterprise domains using real NVIDIA data.
  • Discuss multi-bot architectures (domain-specific, enterprise-wide, copilot) and the role of guardrails, data governance, and access control in deployment.
Figure 1. Control Points in a typical RAG pipeline when building Chatbots.
Figure 1. Control Points in a typical RAG pipeline when building Chatbots.

실험 결과

연구 질문

  • RQ1What are the key challenges to building and deploying enterprise-grade generative AI chatbots using RAG?
  • RQ2How can the FACTS dimensions and the 15 control points be used to optimize RAG-based chatbot performance?
  • RQ3What are the trade-offs between accuracy and latency when using large versus small LLMs in enterprise chatbots?
  • RQ4How should organizations balance architecture flexibility with guardrails, security, and cost in a scalable chatbot platform?

주요 결과

  • A framework called FACTS (Freshness, Architecture, Cost, Testing, Security) captures the core dimensions needed for enterprise RAG chatbots.
  • Fifteen control points in RAG pipelines critically influence accuracy, latency, and safety, with metadata enrichment, chunking, query rephrasal, and query reranking having high impact.
  • Hybrid search combining lexical and vector search improves retrieval coverage and relevance.
  • Flexible architectures (NVBot) support multiple bots and domain-specialized copilots within a unified platform.
  • Empirical results show trade-offs between large and small LLMs in terms of accuracy and latency, with open-source LLMs sometimes matching or approaching larger models in enterprise tasks.
  • Security, access control, data governance, and guardrails are essential to prevent data leakage and ensure policy-compliant responses.
Figure 2. Agent architecture for handling complex queries
Figure 2. Agent architecture for handling complex queries

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.