QUICK REVIEW

[论文解读] VerifAI: Verified Generative AI

Nan Tang, Chenyu Yang|arXiv (Cornell University)|Jul 6, 2023

Data Quality and Management被引用 11

一句话总结

VerifAI 提出一个模块化框架，通过检索与对多模态数据湖进行推理来验证生成式 AI 输出，从而提高 tuple、table 和 text 输出的可靠性。

ABSTRACT

Generative AI has made significant strides, yet concerns about the accuracy and reliability of its outputs continue to grow. Such inaccuracies can have serious consequences such as inaccurate decision-making, the spread of false information, privacy violations, legal liabilities, and more. Although efforts to address these risks are underway, including explainable AI and responsible AI practices such as transparency, privacy protection, bias mitigation, and social and environmental responsibility, misinformation caused by generative AI will remain a significant challenge. We propose that verifying the outputs of generative AI from a data management perspective is an emerging issue for generative AI. This involves analyzing the underlying data from multi-modal data lakes, including text files, tables, and knowledge graphs, and assessing its quality and consistency. By doing so, we can establish a stronger foundation for evaluating the outputs of generative AI models. Such an approach can ensure the correctness of generative AI, promote transparency, and enable decision-making with greater confidence. Our vision is to promote the development of verifiable generative AI and contribute to a more trustworthy and responsible use of AI.

研究动机与目标

从数据管理的角度推动对生成式 AI 输出的验证。
开发一个模块化框架，能够对生成的数据进行索引、重新排序并与数据湖证据进行核验。
通过关于元组和文本验证的初步实验来证明可行性。
突出跨模态数据发现与验证中的开放问题与挑战。

提出的方法

带有基于内容的（Elasticsearch）和基于向量的（Faiss）索引的索引器，以覆盖多模态数据。
再排序器提供细粒度、任务特定的排序（文本-文本 via ColBERT；文本-表格 via OpenTFV）。
验证器组合，包括通用模型（如 ChatGPT）和本地化模型（表格验证使用 OpenTFV/PASTA，基于 RoBERTa 的元组验证）。
通过一个 (g, x) 映射和 0/1/2 标签（已验证、已反驳、不相关）进行证据驱动的验证。
对验证谱系进行溯源处理并支持人工调试。
实验设置在从数据湖检索数据并用于生成的表格与文本验证方面进行验证。

Figure 1. Generative AI can generate (a) values in tuples and (b) text. Our system, VerifAI , tries to either verify or refute generated value, by reasoning the (generated data, evidence) pair where the evidence is discovered from data lakes.

实验结果

研究问题

RQ1一个模块化的验证器（Indexer-Reranker-Verifier）是否能够可靠地使用数据湖证据来验证或反驳生成式 AI 的输出？
RQ2多模态数据湖如何支持对生成的元组、表格和文本断言的验证？
RQ3在不同模态中通用模型与本地化模型的相对性能如何？
RQ4在验证生成式 AI 输出时，实际挑战（隐私、信任、溯源）有哪些？

主要发现

Generated data type	Retrieved data type	Recall
tuple	tuple	0.99
text	text files	0.58
textual claim	table	0.88

VerifAI 在检索与验证任务的相关数据方面实现了高召回率（tuple-to-tuple 为 0.99，text-to-text 为 0.58，文本断言到表格为 0.88）。
ChatGPT 作为验证器在（元组、元组）验证上可达到 0.88 的准确率，且在某些文本-表格场景下优于一些专门模型；PASTA 在检索到相关表格时在（文本、表格）验证上可超越 ChatGPT。
文本断言验证从检索到的表格中获益，当相关场景下 PASTA 在某些方面超过 ChatGPT，而当许多表格不相关时，ChatGPT 的泛化能力更强。
该研究强调数据源的溯源性与可信度的重要性，并将跨模态发现与验证确定为关键的开放问题。
检索在（元组、元组）和（文本断言、表格）方面显示出有效性，但在与元组证据相关联的文本文件检索上性能较弱。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。