QUICK REVIEW

[論文レビュー] Fine-tuning Language Models for Factuality

Katherine Tian, Eric Mitchell|arXiv (Cornell University)|Nov 14, 2023

Topic Modeling被引用数 10

ひとこと要約

この論文は、直接的な嗜好最適化を用いて、ヒューマンラベルなしで長文の事実性を高めるための参照なしおよび参照ベースの事実性調整アプローチを提示し、 Biography および医療QAタスクにおいて RLHF やデコードベースの手法より事実性を改善する。

ABSTRACT

The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations.' These errors can inadvertently spread misinformation or harmfully perpetuate misconceptions. Further, manual fact-checking of model responses is a time-consuming process, making human factuality labels expensive to acquire. In this work, we fine-tune language models to be more factual, without human labeling and targeting more open-ended generation settings than past work. We leverage two key recent innovations in NLP to do so. First, several recent works have proposed methods for judging the factuality of open-ended text by measuring consistency with an external knowledge base or simply a large model's confidence scores. Second, the direct preference optimization algorithm enables straightforward fine-tuning of language models on objectives other than supervised imitation, using a preference ranking over possible model responses. We show that learning from automatically generated factuality preference rankings, generated either through existing retrieval systems or our novel retrieval-free approach, significantly improves the factuality (percent of generated claims that are correct) of Llama-2 on held-out topics compared with RLHF or decoding strategies targeted at factuality. At 7B scale, compared to Llama-2-chat, we observe 58% and 40% reduction in factual error rate when generating biographies and answering medical questions, respectively.

研究の動機と目的

Motivate and address the challenge of factual inaccuracies ('hallucinations') in large language models.
Propose a fact-based fine-tuning pipeline that uses automatically generated preference data without human labeling.
Compare reference-based and reference-free truthfulness estimators to guide learning.
Demonstrate that factuality-focused tuning can outperform RLHF and complement decoding-time factuality interventions.
Evaluate across biography generation and medical question-answering tasks to show generality and robustness.

提案手法

Use Direct Preference Optimization (DPO) to fine-tune language models from preference pairs without explicit reward modeling or online sampling.
Construct preference data from unlabeled prompts by scoring candidate responses with truthfulness estimators (reference-based or reference-free) and selecting the more truthful response as the preferred option.
For reference-based truthfulness, extract atomic claims and check support using a fine-tuned fact-checking model against Wikipedia (FactScore).
For reference-free truthfulness, convert each atomic claim into a minimally ambiguous question and estimate model confidence by resampling answers to gauge uncertainty.
Train with DPO on the resulting preference data to push the model toward more truthful outputs.
Investigate compatibility with RLHF-based chat models and potential synergy with decoding-time factuality interventions like DOLA.

実験結果

リサーチクエスチョン

RQ1Can factuality in long-form generation be improved without human labels by learning from automatically generated preference rankings?
RQ2How do reference-based and reference-free truthfulness estimators compare in guiding factuality tuning?
RQ3Does factuality-focused fine-tuning complement or conflict with RLHF and decoding-time interventions?
RQ4Is the approach effective across multiple domains, such as biographies and medical question answering, and for chat models?
RQ5What qualitative changes occur in model outputs after factuality tuning (e.g., style, structure)?

主な発見

Model	Method	Biography #Correct	Biography #Incorrect	Biography %Correct	MedicalQA #Correct	MedicalQA %Correct
Llama-1	ITI	11.67	0	0.669	0	0.0
Llama-1	DOLA	11.75	0	0.754	0	0.0
Llama-1	SFT	13.78	12.16	0.568	10.75	0.631
Llama-1	FactTune-FS (ours)	14.81	0	0.812	10.88	0.450
Llama-1	FactTune-MC (ours)	10.59	0	0.783	12.31	0.646
Llama-2	ITI	18.50	0	0.760	10.97	0.730
Llama-2	DOLA	13.41	0	0.696	0	0.0
Llama-2	SFT	12.19	0	0.701	11.75	0.635
Llama-2	FactTune-FS (ours)	17.06	0	0.895	12.53	0.783
Llama-2	FactTune-MC (ours)	11.31	0	0.846	11.41	0.704

Factuality tuning using FactTune-FS (reference-based preferences) consistently improves factual accuracy vs RLHF and decoding baselines across biographies and medical QA.
FactTune-FS reduces factual errors and increases correct facts, achieving higher %Correct than baselines on both tasks.
FactTune-MC (reference-free, model-confidence preferences) also reduces error rates and improves factuality, offering a strong, scalable alternative without external references.
Factuality tuning can complement decoding-time interventions (e.g., DOLA), with mixed but often positive gains in factuality.
Fine-tuning improves factuality for RLHF chat models (Llama-2-7b-Chat) when combined with factuality objectives.
Human and GPT-4 evaluations correlate with FactScore improvements, indicating reduced reward overfitting and genuine factual improvements.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。