QUICK REVIEW

[논문 리뷰] Rationale-Guided Retrieval Augmented Generation for Medical Question Answering

Jiwoong Sohn, Yein Park|arXiv (Cornell University)|2024. 11. 01.

Topic Modeling인용 수 6

한 줄 요약

RAG 2는 합리성 가이드 필터링, 합리성 기반 쿼리 구성, 그리고 균형 잡힌 검색을 통해 의료 QA를 개선한다; 벤치마크와 모델 크기에 걸쳐 LLM 정확도를 일관되게 향상시킨다.

ABSTRACT

Large language models (LLM) hold significant potential for applications in biomedicine, but they struggle with hallucinations and outdated knowledge. While retrieval-augmented generation (RAG) is generally employed to address these issues, it also has its own set of challenges: (1) LLMs are vulnerable to irrelevant or incorrect context, (2) medical queries are often not well-targeted for helpful information, and (3) retrievers are prone to bias toward the specific source corpus they were trained on. In this study, we present RAG$^2$ (RAtionale-Guided RAG), a new framework for enhancing the reliability of RAG in biomedical contexts. RAG$^2$ incorporates three key innovations: a small filtering model trained on perplexity-based labels of rationales, which selectively augments informative snippets of documents while filtering out distractors; LLM-generated rationales as queries to improve the utility of retrieved snippets; a structure designed to retrieve snippets evenly from a comprehensive set of four biomedical corpora, effectively mitigating retriever bias. Our experiments demonstrate that RAG$^2$ improves the state-of-the-art LLMs of varying sizes, with improvements of up to 6.1\%, and it outperforms the previous best medical RAG model by up to 5.6\% across three medical question-answering benchmarks. Our code is available at https://github.com/dmis-lab/RAG2.

연구 동기 및 목표

생물의학 LLM에서 허구 말과 구식 지식을 해결하기 위해 검색과 생성을 통합한다.
합리성 perplexity 차이에 기초하여 작은 필터링 모델을 학습시켜 검색 편향과 주의 산만 요인을 완화한다.
LLM이 생성한 합리적 근거를 쿼리로 활용하여 QA 유용성을 향상시킨다.
4개의 생의학 코퍼스에서 균형 잡힌 근거 소스를 확보하여 말뭉치 편향을 줄인다.

제안 방법

합리성과 검색 문서 여부를 비교한 perplexity 기반 레이블을 사용하여 작은 Flan-T5 기반 필터를 학습한다.
사실 증거를 검색하기 위한 프롬프트로 LLM이 생성한 합리적 근거를 사용한다(합리성 기반 쿼리).
출처의 균형을 맞추기 위해 네 코퍼스에서 동등한 수의 발췌를 검색한다(PubMed, PMC, textbooks, clinical guidelines).
균형 검색 후 재랭커(MedCPT)를 적용하여 발췌의 관련성을 정제한다.
반복적이고 비용이 큰 프로세스를 피하기 위해 단일 패스 생성으로 평가한다.

실험 결과

연구 질문

RQ1합리성 기반 필터링이 기본 LLM에 대해 검색된 발췌의 유용성을 향상시키는가?
RQ2합리성 기반 쿼리가 의료 벤치마크 전반에서 근거 활용도와 QA 성능을 향상시키는가?
RQ3균형 검색이 검색기 편향을 줄이고 코퍼스 간 커버리지를 향상시키는가?
RQ4RAG 2가 다양한 백본 LLM 및 의료 QA 데이터 세트에 미치는 영향은 무엇인가?

주요 결과

RAG 2는 백본 LLM 전반에 걸쳐 평균 정확도 향상을 최대 6.1%까지 제공합니다.
RAG 2는 3개의 의료 QA 벤치마크에서 기존의 의료 RAG 모델보다 최대 5.6% 향상시킵니다.
RAG 2는 오픈 소스, 의료 및 상용 LLM을 향상시키며 주목할 만한 이득을 제공합니다(예: GPT-4o에서 현저한 상승).
균형 검색은 주 벤치마크에서 일관되게 MedRAG를 능가합니다.
요인 제거 분석은 합리성 기반 필터링과 합리성 쿼리가 상당한 성능 향상에 기여함을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.