QUICK REVIEW

[論文レビュー] HeSS: Head Sensitivity Score for Sparsity Redistribution in VGGT

Yongsung Kim, Wooseok Song|arXiv (Cornell University)|Mar 26, 2026

Visual Attention and Saliency Detection被引用数 0

ひとこと要約

本論文は HeSS を導入し VGGT の global attention における head-wise sensitivity を定量化し、HeSS 指向の sparsification により attention ブudget を再配分して高い sparsity 下での頑健性を向上させる。attention heads が異質な感度を持つことを示し、再配分は均一マスキングよりも性能を保持する。

ABSTRACT

Visual Geometry Grounded Transformer (VGGT) has advanced 3D vision, yet its global attention layers suffer from quadratic computational costs that hinder scalability. Several sparsification-based acceleration techniques have been proposed to alleviate this issue, but they often suffer from substantial accuracy degradation. We hypothesize that the accuracy degradation stems from the heterogeneity in head-wise sparsification sensitivity, as the existing methods apply a uniform sparsity pattern across all heads. Motivated by this hypothesis, we present a two-stage sparsification pipeline that effectively quantifies and exploits headwise sparsification sensitivity. In the first stage, we measure head-wise sparsification sensitivity using a novel metric, the Head Sensitivity Score (HeSS), which approximates the Hessian with respect to two distinct error terms on a small calibration set. In the inference stage, we perform HeSS-Guided Sparsification, leveraging the pre-computed HeSS to reallocate the total attention budget-assigning denser attention to sensitive heads and sparser attention to more robust ones. We demonstrate that HeSS effectively captures head-wise sparsification sensitivity and empirically confirm that attention heads in the global attention layers exhibit heterogeneous sensitivity characteristics. Extensive experiments further show that our method effectively mitigates performance degradation under high sparsity, demonstrating strong robustness across varying sparsification levels. Code is available at https://github.com/libary753/HeSS.

研究の動機と目的

グローバルアテンションにおける head-wise sensitivity の非均質性を認識することにより VGGT で sparsity を動機づける。
カメラ姿勢と点群誤差に関連する Hessian/Fisher 情報に基づく per-head sensitivity metric (HeSS) を定義する。
HeSS-Guided Sparsification を開発し、HeSS を用いて heads 間に attention budgets を再配分する。
uniform sparsification メソッドと比較して、 sparsity レベルの異なる状況で頑健性と性能を向上させることを示す。

提案手法

2 段階の sparsification パイプライン：HeSS を計算する較正、次に HeSS-guided masking による推論。
Fisher Information Matrix の query projection parameters に対する偏微分を用いてカメラ姿勢と点群の 2 つの誤差に関するヘシアン近似から各 head の HeSS を計算。
各 head の cam 感度と pc 感度を重み付きで組み合わせて最終的な per-head HeSS を形成。
HeSS に基づいて層ごとの総 attention budget を heads 間に再配分する水びょう風水準拡張の反復的予算キャッピング過程。
推論時に最終的な per-head 予算に従って head-wise masked block-sparse attention を適用し、感度の高い head にはより dense、頑健な head にはマスクを軽くする。

実験結果

リサーチクエスチョン

RQ1VGGT の global attention 層間で head が異質な sparsification 感度を示すか？
RQ2per-head sensitivity score (HeSS) が sparsification を効果的に導き、 sparsity が高い状況で性能を保持できるか？
RQ3HeSS-guided sparsification は Uniform sparsification 手法（例: SparseVGGT）や ViT sparsification を 3D 視覚タスクで上回るか？
RQ4HeSS におけるカメラ姿勢と点群誤差の組み合わせは sparsity レベルを通じて頑健性にどう影響するか？
RQ5提案手法は pi3 のような関連 3D 視覚トランスフォーマーへ generalizable か？

主な発見

HeSS は head 感度の非一様な分布を明示的に明らかにし、 sparsification の影響に実質的に寄与する head は一部であることを示す。
HeSS-guided sparsification はカメラ姿勢推定および multi-view stereo のベンチマークで高 sparsity 下の性能を一貫して向上させる。
本手法は uniform sparsification および ViT sparsification のベースラインを上回り、高 sparsity で幾何再構成品質を保持する。
HeSS における両方の誤差成分（カメラ姿勢と点群）の組み合わせは single 成分を用いる場合と異なり、 sparsity レベルを超えて頑健な性能をもたらす。
予算再配分には反復的キャッピングが必要で、効果的な sparsity 分布と性能維持を達成する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。