QUICK REVIEW

[论文解读] From RAG to QA-RAG: Integrating Generative AI for Pharmaceutical Regulatory Compliance Process

Jaewoong Kim, Moohong Min|arXiv (Cornell University)|Jan 26, 2024

Statistical and Computational Modeling被引用 10

一句话总结

QA-RAG 在通过将微调后的大型语言模型 (LLM) 的答案融入双轨检索来扩展 Retrieval Augmented Generation，提高药品监管指南的上下文相关性和最终答案质量。

ABSTRACT

Regulatory compliance in the pharmaceutical industry entails navigating through complex and voluminous guidelines, often requiring significant human resources. To address these challenges, our study introduces a chatbot model that utilizes generative AI and the Retrieval Augmented Generation (RAG) method. This chatbot is designed to search for guideline documents relevant to the user inquiries and provide answers based on the retrieved guidelines. Recognizing the inherent need for high reliability in this domain, we propose the Question and Answer Retrieval Augmented Generation (QA-RAG) model. In comparative experiments, the QA-RAG model demonstrated a significant improvement in accuracy, outperforming all other baselines including conventional RAG methods. This paper details QA-RAG's structure and performance evaluation, emphasizing its potential for the regulatory compliance domain in the pharmaceutical industry and beyond. We have made our work publicly available for further research and development.

研究动机与目标

解决 navigating extensive pharmaceutical regulatory guidelines (FDA/ICH) 的低效问题。
提出一个 QA-RAG 聊天机器人，将微调的 LLM 答案与用户查询融合以改进检索。
证明 QA-RAG 在上下文检索和答案生成方面优于传统 RAG 基线。
展示该模型在药业之外对其他领域特定法规的潜在适用性。

提出的方法

使用密集文档嵌入（嵌入模型：LLM-Embedder）和 FAISS 对 1,404 份经 OCR 处理的 FDA/ICH 指南文档进行可扩展的相似性搜索（按 10,000 字块分块，重叠 2,000 字）。
实现双轨检索：使用用户查询和一个由微调后的 LLM 生成的假设答案来检索文档（FDA Q&A 数据）。
对两个 LLM（ChatGPT 3.5-Turbo 和 Mistral-7B）进行基于 FDA FAQ 数据的微调；通过 BertScore 与 GPT-4 进行比较；为最佳精确度/召回平衡选择 ChatGPT 3.5-Turbo。
应用再排序器（BGE reranker）按与查询及最终答案生成阶段的相关性对检索文档进行排序。
使用 few-shot 提示通过 ChatGPT-3.5-Turbo 最终答案代理生成最终答案。
使用 LLMs-as-judges 框架进行评估（对上下文检索使用 Ragas，对答案质量使用 BertScore）。

实验结果

研究问题

RQ1一个面向 QA 的 RAG 变体是否能提高 pharma 监管指南的检索精确度和召回率？
RQ2将微调后的 LLM 的假设答案融入检索是否比传统 RAG 和 HyDE 风格方法能提高最终答案质量？
RQ3在受监管领域中，微调后的 LLM 与通用 LLM 对上下文检索和答案生成的影响是什么？

主要发现

QA-RAG 在上下文精确度（0.717）和上下文召回率（0.328）方面均高于主要基线。
在答案生成方面，QA-RAG 的精确度 0.551、召回率 0.645、F1 0.591，优于基线。
使用微调后的 LLM 的假设答案显著提升了检索相关性，相较仅问句或 HyDE 基线。
消融研究表明假设答案组件极大地提升了上下文精度；将问题与假设答案结合可获得最强的性能。
微调后的 LLM（ChatGPT 3.5-Turbo）在该领域优于其他变体，验证了针对监管任务的领域自适应微调的有效性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。