QUICK REVIEW

[论文解读] HeSS: Head Sensitivity Score for Sparsity Redistribution in VGGT

Yongsung Kim, Wooseok Song|arXiv (Cornell University)|Mar 26, 2026

Visual Attention and Saliency Detection被引用 0

一句话总结

本论文引入 HeSS 来量化 VGGT 的全局注意力中的头部敏感性，并使用 HeSS 指导的稀疏化重新分配注意力预算，在高稀疏性下提高鲁棒性。结果表明注意力头具有异质性敏感性，重新分配比均匀屏蔽更能保持性能。

ABSTRACT

Visual Geometry Grounded Transformer (VGGT) has advanced 3D vision, yet its global attention layers suffer from quadratic computational costs that hinder scalability. Several sparsification-based acceleration techniques have been proposed to alleviate this issue, but they often suffer from substantial accuracy degradation. We hypothesize that the accuracy degradation stems from the heterogeneity in head-wise sparsification sensitivity, as the existing methods apply a uniform sparsity pattern across all heads. Motivated by this hypothesis, we present a two-stage sparsification pipeline that effectively quantifies and exploits headwise sparsification sensitivity. In the first stage, we measure head-wise sparsification sensitivity using a novel metric, the Head Sensitivity Score (HeSS), which approximates the Hessian with respect to two distinct error terms on a small calibration set. In the inference stage, we perform HeSS-Guided Sparsification, leveraging the pre-computed HeSS to reallocate the total attention budget-assigning denser attention to sensitive heads and sparser attention to more robust ones. We demonstrate that HeSS effectively captures head-wise sparsification sensitivity and empirically confirm that attention heads in the global attention layers exhibit heterogeneous sensitivity characteristics. Extensive experiments further show that our method effectively mitigates performance degradation under high sparsity, demonstrating strong robustness across varying sparsification levels. Code is available at https://github.com/libary753/HeSS.

研究动机与目标

通过认识全局注意力中的头部敏感性异质性来推动 VGGT 的稀疏化。
基于与相机位姿和点云误差相关的海森矩阵/费舍尔信息，定义逐头敏感性度量（HeSS）。
开发 HeSS 指导的稀疏化，通过 HeSS 在头部之间重新分配注意力预算。
在不同稀疏水平下展示鲁棒性和更好性能，相较于均匀稀疏化方法。

提出的方法

两阶段稀疏化流程：标定以计算 HeSS，然后进行推理时使用 HeSS 指导的屏蔽。
使用查询投影参数相对于两类误差（相机位姿和点云）的费舍尔信息矩阵的海森近似来计算每个头的 HeSS。
将每个头的相机位姿和点云敏感性以权重结合，形成最终的每头 HeSS。
按照 HeSS 使用水位填充风格的迭代预算上限，将每层的总注意力预算重新分配给各头。
在推理时基于最终的逐头预算应用头部掩蔽的块稀疏注意力，确保对敏感头部有更密的注意力，对鲁棒头部进行更轻的屏蔽。

实验结果

研究问题

RQ1VGGT 的全局注意力层中，注意力头是否表现出对稀疏化的异质性敏感性？
RQ2逐头敏感性评分（HeSS）是否能在高稀疏下有效引导稀疏化以保持性能？
RQ3HeSS 指导的稀疏化是否优于均匀稀疏化方法（如 SparseVGGT）和 ViT 稀疏化在三维视觉任务中的表现？
RQ4HeSS 将相机位姿和点云误差结合在一起对不同稀疏水平下的鲁棒性有何影响？
RQ5所提方法是否可推广至相关的 3D 视觉 Transformer，如 pi3？

主要发现

HeSS 展示了头部敏感性分布的显著非均匀性，只有子集头对稀疏化影响贡献显著。
HeSS 指导的稀疏化在高稀疏下对相机位姿估计和多视图立体测评基准的性能有一致性提升。
该方法在更高稀疏度下优于均匀稀疏化和 ViT 稀疏化基线，保持几何重建质量。
在 HeSS 中同时结合两类误差分量（相机位姿和点云）可在各稀疏水平下实现鲁棒性，而仅使用单一分量则不然。
通过带迭代上限的预算重新分配是实现有效稀疏分布和性能保持所必需的。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。