QUICK REVIEW

[论文解读] HIME: Mitigating Object Hallucinations in LVLMs via Hallucination Insensitivity Model Editing

Ahmed Akl, Abdelwahed Khamis|arXiv (Cornell University)|Feb 21, 2026

Adversarial Robustness in Machine Learning被引用 0

一句话总结

该论文提出 Hallucination Insensitivity Model Editing (HIME)，一种无训练、按层自适应的权重编辑方法，在 Hallucination Insensitivity Score (HIS) 指导下抑制 LVLMs 的对象性幻觉，同时保留预训练知识，在多种骨干网络上实现显著的幻觉减少。

ABSTRACT

Large Vision-Language Models (LVLMs) have demonstrated impressive multimodal understanding capabilities, yet they remain prone to object hallucination, where models describe non-existent objects or attribute incorrect factual information, raising serious concerns for reliable real-world deployment. While fine-tuning is a commonly adopted mitigation strategy, its high computational cost and practical difficulty motivate the need for training-free alternatives, among which model editing has recently emerged as a promising direction. However, indiscriminate editing risks disrupting the rich implicit knowledge encoded in pre-trained LVLMs, leading to a fundamental question: how much intervention is necessary at each layer to suppress hallucinations while preserving pre-trained knowledge? To address this question, we present a systematic analysis of LVLM decoders built on three widely used large language model backbones-Qwen, LLaMA, and Vicuna-revealing clear layer-wise differences in susceptibility to object hallucination. Building on these insights, we introduce the Hallucination Insensitivity Score (HIS), a principled metric that quantifies each layer's sensitivity to hallucination and provides guidance for targeted intervention. Leveraging HIS, we propose Hallucination Insensitivity Model Editing (HIME), a simple yet effective layer-adaptive weight editing approach that selectively modifies latent features to suppress hallucinations while preserving pre-trained knowledge. Extensive experiments demonstrate that HIME reduces hallucinations by an average of 61.8% across open-ended generation benchmarks, including CHAIR, MME, and GPT-4V-aided evaluation, without introducing additional parameters, inference-time latency, or computational overhead.

研究动机与目标

Identify layer-wise variation in hallucination susceptibility across LVLM decoders (Qwen, LLaMA, Vicuna backbones).
Introduce Hallucination Insensitivity Score (HIS) to quantify layer sensitivity to hallucination.
Develop Hallucination Insensitivity Model Editing (HIME) to selectively edit latent directions and suppress hallucinations.
Demonstrate that HIME reduces object hallucinations by ~61.8% on open-ended generation benchmarks without extra parameters or overhead.

提出的方法

Derive HIS by contrasting attention distributions for truthful vs hallucinated samples using KL divergence across decoder layers.
Compute layer-wise attention-guided representations from truthful and hallucinated samples, derive a Z_l difference, and perform SVD to identify a low-rank hallucination subspace.
Construct a weighted null-space projector N_l = I - HIS_c_l * V_l,k V_l,k^T that selectively edits MLP weights in targeted layers.
Apply the edited weights to LVLM decoders with zero additional parameters or inference-time costs by reloading the edited weights for inference.
Edit operations are layer-adaptive and controlled by HIS_c_l, enabling smooth intervention rather than full, uniform editing.

实验结果

研究问题

RQ1How does object hallucination susceptibility vary across decoder layers in LVLMs?
RQ2Can a layer-wise metric (HIS) guide targeted, training-free model editing to suppress hallucinations while preserving knowledge?
RQ3Does HIME reduce object hallucinations across multiple LVLM backbones and benchmarks without adding parameters or latency?
RQ4What is the impact of HIME on downstream perception tasks and overall model utility?

主要发现

LVLMs	CHAIR_S	CHAIR_I	BLEU
LLaVA-1.5 Original	181.67±2.36	118.33±12.47	104.44±5.67
LLaVA-1.5 Nullu	190.00±4.08	121.11±7.74	105.56±4.20
LLaVA-1.5 HIME	195.00±0.00	155.56±4.81	123.33±0.00
QWen2-VL-8B-Instruct Original	20.8	5.36	11.16
QWen2-VL-8B-Instruct HIME	6.00	3.44	8.89

HIME reduces object hallucination by an average of 61.8% across open-ended generation benchmarks (CHAIR, MME, GPT-4V-aided evaluation).
HIME achieves this without adding parameters, inference-time latency, or computational overhead.
HIS provides a layer-wise sensitivity measure that guides targeted edits, with smooth interpolation between no-edit and full-edit regimes.
Editing is performed offline via a weighted null-space projection onto a top-k hallucination subspace per layer, preserving pre-trained knowledge.
Experiments across LLaVA-1.5, MiniGPT-4, mPLUG-Owl2 and Qwen backbones show consistent improvements in hallucination metrics and perception tasks.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。