QUICK REVIEW

[논문 리뷰] Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models

Yuji Zhang, Sha Li|arXiv (Cornell University)|2024. 07. 10.

Machine Learning in Healthcare인용 수 5

한 줄 요약

본 논문은 knowledge overshadowing이라는 현상을 확인합니다. 다중 조건으로 구성된 프롬프트가 일부 조건을 간과하게 만들어 LLM이 합성된 환상을 생성하고, 이를 완화하기 위해 추론 시 탐지와 Self-Contrastive Decoding을 제안합니다.

ABSTRACT

Hallucination is often regarded as a major impediment for using large language models (LLMs), especially for knowledge-intensive tasks. Even when the training corpus consists solely of true statements, language models still generate hallucinations in the form of amalgamations of multiple facts. We coin this phenomenon as ``knowledge overshadowing'': when we query knowledge from a language model with multiple conditions, some conditions overshadow others, leading to hallucinated outputs. This phenomenon partially stems from training data imbalance, which we verify on both pretrained models and fine-tuned models, over a wide range of LM model families and sizes.From a theoretical point of view, knowledge overshadowing can be interpreted as over-generalization of the dominant conditions (patterns). We show that the hallucination rate grows with both the imbalance ratio (between the popular and unpopular condition) and the length of dominant condition description, consistent with our derived generalization bound. Finally, we propose to utilize overshadowing conditions as a signal to catch hallucination before it is produced, along with a training-free self-contrastive decoding method to alleviate hallucination during inference. Our proposed approach showcases up to 82% F1 for hallucination anticipation and 11.2% to 39.4% hallucination control, with different models and datasets.

연구 동기 및 목표

Investigate why prompts with multiple conditions induce amalgamated hallucinations in LLMs even when training data is correct.
Characterize how data imbalance and condition length affect hallucination rates across model families and sizes.
Develop inference-time strategies to detect overshadowing and mitigate hallucinations without retraining.

제안 방법

Knowledge overshedowing을 dominant conditions p(y|AB) ≈ p(y|A)로 정의하여 덜 대표적인 B가 무시되는 현상으로 설명합니다.
사전 학습된 모델과 미세조정 모델에서 여러 작업과 모델 크기에 걸쳐 그림자 현상을 경험적으로 보입니다.
불균형 비율, 조건 길이와 환상률 간의 관계를 정량화합니다.
NTP 손실과 GSNR을 연결하여 Overshadowing과 모델 일반화 간의 일반화 경계를 도출합니다.
PMI-based overshadows detection과 감지용 Escaping Penalty Mechanism (EPM)을 제안합니다.
추론 시 Dominance 편향을 줄이기 위한 Self-Contrastive Decoding (SCD)을 소개합니다.

실험 결과

연구 질문

RQ1Does knowledge overshadowing occur across model families and sizes when prompts contain multiple conditions?
RQ2How do data imbalance and condition length influence the hallucination rate in autoregressive LLMs?
RQ3Can inference-time detection and decoding techniques anticipate and mitigate overshadowing-induced hallucinations without retraining?
RQ4What theoretical insights relate overshadowing to generalization bounds in next-token prediction?

주요 결과

Knowledge overshadowing produces amalgamated hallucinations across multiple model families and sizes.
Hallucination rate increases with imbalance ratio, and larger models exhibit higher relative hallucination rates.
Longer dominant condition descriptions lead to higher hallucination rates, with curves steepening for smaller models.
A training-free overshadows detector using PMI signals achieves up to 82% F1 on anticipation datasets.
Self-Contrastive Decoding reduces hallucination rates by 11.2% to 39.4% across datasets and models.
A theoretical generalization bound links shadowing to GSNR and dominant-condition length.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.