QUICK REVIEW

[논문 리뷰] HIME: Mitigating Object Hallucinations in LVLMs via Hallucination Insensitivity Model Editing

Ahmed Akl, Abdelwahed Khamis|arXiv (Cornell University)|2026. 02. 21.

Adversarial Robustness in Machine Learning인용 수 0

한 줄 요약

논문은 Hallucination Insensitivity Model Editing (HIME)을 제안한다. 이는 트레이닝 없이 동작하는 층-적응 가중치 편집 방법으로, Hallucination Insensitivity Score (HIS)에 의해 안내되며 LVLM에서 객체 환상을 억제하면서 사전 학습된 지식을 보존한다. 다양한 백본에서 환상 감소를 크게 달성한다.

ABSTRACT

Large Vision-Language Models (LVLMs) have demonstrated impressive multimodal understanding capabilities, yet they remain prone to object hallucination, where models describe non-existent objects or attribute incorrect factual information, raising serious concerns for reliable real-world deployment. While fine-tuning is a commonly adopted mitigation strategy, its high computational cost and practical difficulty motivate the need for training-free alternatives, among which model editing has recently emerged as a promising direction. However, indiscriminate editing risks disrupting the rich implicit knowledge encoded in pre-trained LVLMs, leading to a fundamental question: how much intervention is necessary at each layer to suppress hallucinations while preserving pre-trained knowledge? To address this question, we present a systematic analysis of LVLM decoders built on three widely used large language model backbones-Qwen, LLaMA, and Vicuna-revealing clear layer-wise differences in susceptibility to object hallucination. Building on these insights, we introduce the Hallucination Insensitivity Score (HIS), a principled metric that quantifies each layer's sensitivity to hallucination and provides guidance for targeted intervention. Leveraging HIS, we propose Hallucination Insensitivity Model Editing (HIME), a simple yet effective layer-adaptive weight editing approach that selectively modifies latent features to suppress hallucinations while preserving pre-trained knowledge. Extensive experiments demonstrate that HIME reduces hallucinations by an average of 61.8% across open-ended generation benchmarks, including CHAIR, MME, and GPT-4V-aided evaluation, without introducing additional parameters, inference-time latency, or computational overhead.

연구 동기 및 목표

LVLM 디코더(Qwen, LLaMA, Vicuna 백본)에서 층별로 환상 취약성의 차이를 식별한다.
HIS를 도입하여 환상에 대한 층의 민감도를 정량화한다.
HIME를 개발하여 잠재 방향을 선택적으로 편집하고 환상을 억제한다.
추가 파라미터나 오버헤드 없이 HIME가 개방형 생성 벤치마크에서 객체 환상을 약 61.8% 감소시킴을 입증한다.

제안 방법

KL 발산을 이용해 디코더 층 across 진실한 샘플과 환상 샘플에 대한 주의 분포를 대조하여 HIS를 도출한다.
진실한 샘플과 환상 샘플로부터 층별 주의 가이드 표현을 계산하고, Z_l 차이를 도출한 뒤 SVD를 수행하여 저랭크의 환상 하위공간을 식별한다.
목표 층에서 MLP 가중치를 선택적으로 편집하는 가중치 무효공간 프로젝터 N_l = I - HIS_c_l * V_l,k V_l,k^T 를 구성한다.
추가 파라미터나 추론 시 비용 없이 편집된 가중치를 재로딩하여 LVLM 디코더에 적용한다.
편집 연산은 HIS_c_l에 의해 층-적응적으로 제어되며, 완전한 편집이 아니라 부드러운 개입을 가능하게 한다.

실험 결과

연구 질문

RQ1LVLM에서 디코더 층 간 객체 환상 취약성은 어떻게 다르게 나타나는가?
RQ2HIS와 같은 층별 지표가 지식 보존을 해치지 않으면서 환상을 억제하는 대상화된, 훈련-없는 모델 편집을 안내할 수 있는가?
RQ3HIME가 추가 파라미터나 지연 없이 여러 LVLM 백본과 벤치마크에서 객체 환상을 감소시키는가?
RQ4HIME가 다운스트림 인식 작업 및 전체 모델 유용성에 미치는 영향은 무엇인가?

주요 결과

LVLMs	CHAIR_S	CHAIR_I	BLEU
LLaVA-1.5 Original	181.67±2.36	118.33±12.47	104.44±5.67
LLaVA-1.5 Nullu	190.00±4.08	121.11±7.74	105.56±4.20
LLaVA-1.5 HIME	195.00±0.00	155.56±4.81	123.33±0.00
QWen2-VL-8B-Instruct Original	20.8	5.36	11.16
QWen2-VL-8B-Instruct HIME	6.00	3.44	8.89

HIME은 오픈 엔드 생성 벤치마크(CHAIR, MME, GPT-4V-도움 평가)에서 평균 61.8%의 객체 환상을 감소시킨다.
HIME은 파라미터 추가, 추론 시 지연, 또는 계산 오버헤드를 추가하지 않고 이를 달성한다.
HIS는 대상 편집을 안내하는 층별 민감도 지표를 제공하며, 편집 없음과 전체 편집 규범 사이의 매끄러운 보간을 가능하게 한다.
편집은 층별 상단 k 환상 하위공간에 대한 가중 무효 공간 투영으로 오프라인에서 수행되어 사전 학습된 지식을 보존한다.
LLaVA-1.5, MiniGPT-4, mPLUG-Owl2 및 Qwen 백본에서의 실험은 환상 메트릭과 인지 작업에서 일관된 개선을 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.