[논문 리뷰] Hallucination of Multimodal Large Language Models: A Survey
이 논문은 멀티모달 대형 언어 모델(MLLMs)에서의 환각을 조사하고, 원인, 벤치마크, 지표, 보완 전략을 자세히 다루어 신뢰성을 향상시킨다.
This survey presents a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as Large Vision-Language Models (LVLMs), which have demonstrated significant advancements and remarkable abilities in multimodal tasks. Despite these promising developments, MLLMs often generate outputs that are inconsistent with the visual content, a challenge known as hallucination, which poses substantial obstacles to their practical deployment and raises concerns regarding their reliability in real-world applications. This problem has attracted increasing attention, prompting efforts to detect and mitigate such inaccuracies. We review recent advances in identifying, evaluating, and mitigating these hallucinations, offering a detailed overview of the underlying causes, evaluation benchmarks, metrics, and strategies developed to address this issue. Additionally, we analyze the current challenges and limitations, formulating open questions that delineate potential pathways for future research. By drawing the granular classification and landscapes of hallucination causes, evaluation benchmarks, and mitigation methods, this survey aims to deepen the understanding of hallucinations in MLLMs and inspire further advancements in the field. Through our thorough and in-depth review, we contribute to the ongoing dialogue on enhancing the robustness and reliability of MLLMs, providing valuable insights and resources for researchers and practitioners alike. Resources are available at: https://github.com/showlab/Awesome-MLLM-Hallucination.
연구 동기 및 목표
- Define and contextualize hallucination in MLLMs (vision+LLM) and its impact on reliability.
- Present a granular taxonomy of hallucination causes spanning data, model, training, and inference.
- Review existing benchmarks and metrics for evaluating cross-modal hallucinations.
- Survey mitigation strategies that address identified causes and improve grounding.
- Highlight open challenges and future research directions to advance robust MLLMs.
제안 방법
- Taxonomic organization of hallucination types into category, attribute, and relation.
- Analysis of data-related, model-related, training-related, and inference-related causes.
- Compilation and discussion of hallucination benchmarks and metrics (e.g., CHAIR, POPE).
- Review of mitigation techniques aligned to root causes (data curation, model adjustments, training signals, inference interventions).
- Comparison with existing surveys and articulation of open questions to guide future work.

실험 결과
연구 질문
- RQ1What are the primary sources of hallucination in multimodal LLMs (data, model, training, inference) and how do they manifest in cross-modal content?
- RQ2How are hallucinations measured and benchmarked in MLLMs, and what mitigation strategies prove effective against each cause?
- RQ3How do data quality, data quantity, model balance, and alignment interfaces contribute to cross-modal inaccuracies?
- RQ4What are the gaps and open questions in current evaluation and mitigation methods for MLLM hallucinations?
주요 결과
- Object hallucination in MLLMs is categorized into category, attribute, and relation types.
- Hallucination arises from data quantity, quality, and statistical bias, as well as model priors and alignment interfaces.
- Training and inference stages contribute via supervision signals, loss design, and attention dynamics during generation.
- A range of benchmarks and metrics (e.g., CHAIR, POPE) assess hallucination across generative and discriminative tasks.
- Mitigation strategies are linked to specific root causes, including data curation, model adjustments, auxiliary supervision, and decoding interventions.
- The survey provides a structured landscape and identifies open questions to guide future research.

더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.