QUICK REVIEW

[논문 리뷰] RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture

Angels de Luis Balaguer, Vinamra Benara|arXiv (Cornell University)|2024. 01. 16.

Topic Modeling인용 수 50

한 줄 요약

이 논문은 대형 언어 모델에서 Retrieval-Augmented Generation (RAG)과 파인튜닝 파이프라인을 비교하고, 상세한 농업 사례 연구를 제시하며, RAG와 파인튜닝을 결합할 때 정확도 증가가 추가로 발생한다는 것을 보고합니다.

ABSTRACT

There are two common ways in which developers are incorporating proprietary and domain-specific data when building applications of Large Language Models (LLMs): Retrieval-Augmented Generation (RAG) and Fine-Tuning. RAG augments the prompt with the external data, while fine-Tuning incorporates the additional knowledge into the model itself. However, the pros and cons of both approaches are not well understood. In this paper, we propose a pipeline for fine-tuning and RAG, and present the tradeoffs of both for multiple popular LLMs, including Llama2-13B, GPT-3.5, and GPT-4. Our pipeline consists of multiple stages, including extracting information from PDFs, generating questions and answers, using them for fine-tuning, and leveraging GPT-4 for evaluating the results. We propose metrics to assess the performance of different stages of the RAG and fine-Tuning pipeline. We conduct an in-depth study on an agricultural dataset. Agriculture as an industry has not seen much penetration of AI, and we study a potentially disruptive application - what if we could provide location-specific insights to a farmer? Our results show the effectiveness of our dataset generation pipeline in capturing geographic-specific knowledge, and the quantitative and qualitative benefits of RAG and fine-tuning. We see an accuracy increase of over 6 p.p. when fine-tuning the model and this is cumulative with RAG, which increases accuracy by 5 p.p. further. In one particular experiment, we also demonstrate that the fine-tuned model leverages information from across geographies to answer specific questions, increasing answer similarity from 47% to 72%. Overall, the results point to how systems built using LLMs can be adapted to respond and incorporate knowledge across a dimension that is critical for a specific industry, paving the way for further applications of LLMs in other industrial domains.

연구 동기 및 목표

도메인 특화 데이터를 RAG 또는 파인튜닝을 사용하여 LLM에 통합하는 방법에 대한 동기를 부여합니다.
도메인 지식을 위한 다단계 파이프라인(데이터 추출, Q&A 생성, RAG, 파인튜닝)을 개발하고 평가합니다.
농업 분야에서 모델(Llama2-13B, GPT-3.5, GPT-4) 간의 트레이드오프, 비용 및 성능을 정량화합니다.
지리별 지식 포착 및 응답 현지화 개선을 입증합니다.
도메인 데이터에 근거한 산업 코파일럿 구축을 위한 실행 가능한 지침을 제공합니다.

제안 방법

데이터 수집, PDF 정보 추출, Q&A 생성, RAG 기반 정답 생성을 포함하는 다단계 파이프라인을 제안합니다.
문서의 구조를 보존하면서 가 grounding을 위해 PDF를 구조화된 TEI/JSON으로 변환하기 위해 GROBID를 사용합니다.
입력/출력 구조 및 grounding 맥락을 제어하기 위해 Guidance 프레임워크로 Q&A를 생성합니다.
임베딩 기반 검색(sentence transformers + FAISS)과 GPT-4를 사용하여 RAG를 적용한 정답 합성을 수행합니다.
LoRA 및 FSDP 기반 학습을 사용하여 8x A100 GPU에서 Llama2-13B, GPT-4 등의 다수 모델을 파인튜닝합니다; 혼합 정밀도 및 코사인 학습률 스케줄을 사용합니다.
GPT-4 기반 메트릭 스위트를 사용하여 평가하고 도메인 및 지리별로 RAG 대 파인튜닝을 비교합니다.

실험 결과

연구 질문

RQ1산업별 농업 질문에서 LLM 성능 향상에 대해 RAG와 파인튜닝은 어떻게 비교되나요?
RQ2공간적으로 범위가 한정된 파인튜닝이 지리별 지식 정확도에 미치는 영향은 무엇인가요?
RQ3대형 모델(예: GPT-4) 파인튜닝의 비용과 성능 트레이드오프는 RAG 단독 또는 결합 사용과 비교해 무엇인가요?
RQ4지리 인식 Q&A 파이프라인이 답변의 특이성 및 교차 지리 지식 전이를 개선할 수 있나요?
RQ5다른 기본 모델(Llama2-13B, GPT-4, Vicuna)이 RAG 및/또는 파인튜닝과 함께 농업 Q&A에서 어떤 성능을 보이나요?

주요 결과

파인튜닝은 정확도 6pp 이상 증가를 가져옵니다.
파인튜닝과 결합할 때 RAG가 추가로 5 p.p.의 정확도 향상을 기여합니다.
파인튜닝된 모델은 교차 지리 정보 활용으로 답변 유사도를 47%에서 72%로 올릴 수 있습니다.
GPT-4는 일관되게 다른 모델들보다 우수하지만 파인튜닝 및 추론 비용이 상당합니다.
데이터가 맥락상 관련성이 있을 때 RAG가 효과적이며(예: 농장 데이터) 기본 모델보다 더 간결한 답변을 생성하는 경향이 있습니다.
이 연구는 Q&A 생성에서 모델 파인튜닝에 이르기까지 도메인 특화 AI 코파일럿을 구축하기 위한 실용적이고 산업 중심의 파이프라인을 제시합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.