QUICK REVIEW

[论文解读] FACTS About Building Retrieval Augmented Generation-based Chatbots

Rama Akkiraju, Anbang Xu|arXiv (Cornell University)|Jul 10, 2024

AI in Service Interactions被引用 9

一句话总结

本文提出面向企业级RAG型聊天机器人的FACTS框架，详细描述跨RAG管道的15个控制点、灵活的NVBot平台架构，以及在企业场景中大模型与小模型之间的经验权衡。

ABSTRACT

Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This includes fine-tuning embeddings and LLMs, extracting documents from vector databases, rephrasing queries, reranking results, designing prompts, honoring document access controls, providing concise responses, including references, safeguarding personal information, and building orchestration agents. We present a framework for building RAG-based chatbots based on our experience with three NVIDIA chatbots: for IT/HR benefits, financial earnings, and general content. Our contributions are three-fold: introducing the FACTS framework (Freshness, Architectures, Cost, Testing, Security), presenting fifteen RAG pipeline control points, and providing empirical results on accuracy-latency tradeoffs between large and small LLMs. To the best of our knowledge, this is the first paper of its kind that provides a holistic view of the factors as well as solutions for building secure enterprise-grade chatbots."

研究动机与目标

为企业RAG聊天机器人定义FACTS框架（Freshness、Architecture、Cost、Testing、Security）。
识别并描述RAG管道中的15个控制点，并为每个点提供优化策略。
提出一个灵活的、多机器人NVBot平台，以支持领域特定、企业范围和copilots架构。
评估企业聊天机器人部署中大模型与小模型在成本与性能方面的权衡。
通过构建三个NVIDIA聊天机器人（NVInfo、NVHelp、Scout）的实践经验来验证该框架。

提出的方法

描述五个FACTS维度，并论证它们对企业RAG聊天机器人成功的重要性。
列举并解释RAG管道（RAG-C 与 RAGOps）的15个控制点及相应的纠正策略。
给出一个模块化、可插拔的NVBot平台架构，支持选择LLMs、向量数据库、嵌入和代理。
使用真实的NVIDIA数据，对企业领域中大模型与小模型在准确性和延迟方面进行经验比较。
讨论多机器人架构（领域特定、企业范围、copilot）以及在部署中护栏、数据治理和访问控制的作用。

Figure 1. Control Points in a typical RAG pipeline when building Chatbots.

实验结果

研究问题

RQ1使用RAG构建与部署面向企业级的生成式AI聊天机器人面临的关键挑战是什么？
RQ2如何利用FACTS维度和15个控制点来优化基于RAG的聊天机器人性能？
RQ3在企业聊天机器人中使用大模型与小模型时，准确性与延迟之间有哪些权衡？
RQ4组织应如何在可扩展的聊天机器人平台中平衡架构灵活性、护栏、安全与成本？

主要发现

一个名为FACTS（Freshness、Architecture、Cost、Testing、Security）的框架捕捉了企业RAG聊天机器人所需的核心维度。
RAG管道中的十五个控制点对准确性、延迟和安全性有关键影响，其中元数据富化、分块、查询改写和查询排序重新排序影响较大。
结合词汇搜索与向量搜索的混合搜索提升了检索覆盖率和相关性。
灵活的架构（NVBot）在统一平台内支持多机器人和领域专用的copilots。
实证结果显示在准确性和延迟方面，大模型与小模型存在权衡，开源LLMs在企业任务中有时也能匹配或接近更大模型。
安全性、访问控制、数据治理和护栏对于防止数据泄露、确保合规响应至关重要。

Figure 2. Agent architecture for handling complex queries

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。