QUICK REVIEW

[论文解读] Trustworthiness in Retrieval-Augmented Generation Systems: A Survey

Yujia Zhou, Yan Liu|arXiv (Cornell University)|Sep 16, 2024

Caching and Content Delivery被引用 9

一句话总结

一个全面的框架和基准，用于在 six dimensions 上评估 Retrieval-Augmented Generation (RAG) 系统的可信度：factuality, robustness, fairness, transparency, accountability, and privacy.

ABSTRACT

Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs). While much of the current research in this field focuses on performance optimization, particularly in terms of accuracy and efficiency, the trustworthiness of RAG systems remains an area still under exploration. From a positive perspective, RAG systems are promising to enhance LLMs by providing them with useful and up-to-date knowledge from vast external databases, thereby mitigating the long-standing problem of hallucination. While from a negative perspective, RAG systems are at the risk of generating undesirable contents if the retrieved information is either inappropriate or poorly utilized. To address these concerns, we propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy. Within this framework, we thoroughly review the existing literature on each dimension. Additionally, we create the evaluation benchmark regarding the six dimensions and conduct comprehensive evaluations for a variety of proprietary and open-source models. Finally, we identify the potential challenges for future research based on our investigation results. Through this work, we aim to lay a structured foundation for future investigations and provide practical insights for enhancing the trustworthiness of RAG systems in real-world applications.

研究动机与目标

定义一个统一的 six-dimension 框架，用于 RAG 系统中的可信度（factuality, robustness, fairness, transparency, accountability, privacy）。
调查现有文献关于 RAG 可信度每一维度的研究。
构建一个实际可用的基准，用于在 LLMs 上评估可信度，并为未来的 RAG 发展提供可操作的指导。

提出的方法

提出六个可信度维度，并将它们整合到统一的 RAG 评估框架中。
回顾现有文献并按维度、方法类型和对象对方法进行分类。
构建并应用一个基准，在六个维度上对十个 LLMs（专有和开源）进行评估。
回顾在 factuality、robustness、fairness、transparency、accountability、privacy 方面的代表性方法与发现。
给出截至 2024 年 7 月的可信度研究时间线，并概述存在的空白与机会。

实验结果

研究问题

RQ1构成 RAG 系统全面可信度框架的六个维度是什么？
RQ2现有方法在 RAG 的情境下如何解决每个维度？
RQ3不同的 LLMs 在 RAG 设置下在 factuality、robustness、fairness、transparency、accountability、privacy 方面的表现如何？
RQ4哪些基准和评估方法对于衡量 RAG 可信度是有效的？
RQ5提高可信 RAG 部署的关键挑战和未来方向是什么？

主要发现

为 RAG 系统提出了一个包含六个可信度维度的统一框架。
文献综述凸显六个维度的差距与有前景的方向。
建立并应用了一个基准，用于在可信度方面对 10 个 LLMs 进行评估。
代表性方法展示了在提高 factuality、robustness、accountability 和 privacy 方面的思路。
讨论指出挑战与未来的研究方向，以提升可信 RAG 的部署。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。