Skip to main content
QUICK REVIEW

[论文解读] HIME: Mitigating Object Hallucinations in LVLMs via Hallucination Insensitivity Model Editing

Ahmed Akl, Abdelwahed Khamis|arXiv (Cornell University)|Feb 21, 2026
Adversarial Robustness in Machine Learning被引用 0
一句话总结

该论文提出 Hallucination Insensitivity Model Editing (HIME),一种无训练、按层自适应的权重编辑方法,在 Hallucination Insensitivity Score (HIS) 指导下抑制 LVLMs 的对象性幻觉,同时保留预训练知识,在多种骨干网络上实现显著的幻觉减少。

ABSTRACT

Large Vision-Language Models (LVLMs) have demonstrated impressive multimodal understanding capabilities, yet they remain prone to object hallucination, where models describe non-existent objects or attribute incorrect factual information, raising serious concerns for reliable real-world deployment. While fine-tuning is a commonly adopted mitigation strategy, its high computational cost and practical difficulty motivate the need for training-free alternatives, among which model editing has recently emerged as a promising direction. However, indiscriminate editing risks disrupting the rich implicit knowledge encoded in pre-trained LVLMs, leading to a fundamental question: how much intervention is necessary at each layer to suppress hallucinations while preserving pre-trained knowledge? To address this question, we present a systematic analysis of LVLM decoders built on three widely used large language model backbones-Qwen, LLaMA, and Vicuna-revealing clear layer-wise differences in susceptibility to object hallucination. Building on these insights, we introduce the Hallucination Insensitivity Score (HIS), a principled metric that quantifies each layer's sensitivity to hallucination and provides guidance for targeted intervention. Leveraging HIS, we propose Hallucination Insensitivity Model Editing (HIME), a simple yet effective layer-adaptive weight editing approach that selectively modifies latent features to suppress hallucinations while preserving pre-trained knowledge. Extensive experiments demonstrate that HIME reduces hallucinations by an average of 61.8% across open-ended generation benchmarks, including CHAIR, MME, and GPT-4V-aided evaluation, without introducing additional parameters, inference-time latency, or computational overhead.

研究动机与目标

  • Identify layer-wise variation in hallucination susceptibility across LVLM decoders (Qwen, LLaMA, Vicuna backbones).
  • Introduce Hallucination Insensitivity Score (HIS) to quantify layer sensitivity to hallucination.
  • Develop Hallucination Insensitivity Model Editing (HIME) to selectively edit latent directions and suppress hallucinations.
  • Demonstrate that HIME reduces object hallucinations by ~61.8% on open-ended generation benchmarks without extra parameters or overhead.

提出的方法

  • Derive HIS by contrasting attention distributions for truthful vs hallucinated samples using KL divergence across decoder layers.
  • Compute layer-wise attention-guided representations from truthful and hallucinated samples, derive a Z_l difference, and perform SVD to identify a low-rank hallucination subspace.
  • Construct a weighted null-space projector N_l = I - HIS_c_l * V_l,k V_l,k^T that selectively edits MLP weights in targeted layers.
  • Apply the edited weights to LVLM decoders with zero additional parameters or inference-time costs by reloading the edited weights for inference.
  • Edit operations are layer-adaptive and controlled by HIS_c_l, enabling smooth intervention rather than full, uniform editing.

实验结果

研究问题

  • RQ1How does object hallucination susceptibility vary across decoder layers in LVLMs?
  • RQ2Can a layer-wise metric (HIS) guide targeted, training-free model editing to suppress hallucinations while preserving knowledge?
  • RQ3Does HIME reduce object hallucinations across multiple LVLM backbones and benchmarks without adding parameters or latency?
  • RQ4What is the impact of HIME on downstream perception tasks and overall model utility?

主要发现

LVLMsCHAIR_SCHAIR_IBLEU
LLaVA-1.5 Original181.67±2.36118.33±12.47104.44±5.67
LLaVA-1.5 Nullu190.00±4.08121.11±7.74105.56±4.20
LLaVA-1.5 HIME195.00±0.00155.56±4.81123.33±0.00
QWen2-VL-8B-Instruct Original20.85.3611.16
QWen2-VL-8B-Instruct HIME6.003.448.89
  • HIME reduces object hallucination by an average of 61.8% across open-ended generation benchmarks (CHAIR, MME, GPT-4V-aided evaluation).
  • HIME achieves this without adding parameters, inference-time latency, or computational overhead.
  • HIS provides a layer-wise sensitivity measure that guides targeted edits, with smooth interpolation between no-edit and full-edit regimes.
  • Editing is performed offline via a weighted null-space projection onto a top-k hallucination subspace per layer, preserving pre-trained knowledge.
  • Experiments across LLaVA-1.5, MiniGPT-4, mPLUG-Owl2 and Qwen backbones show consistent improvements in hallucination metrics and perception tasks.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。