QUICK REVIEW

[论文解读] Tell Me Where to Look: Guided Attention Inference Network

Kunpeng Li, Ziyan Wu|arXiv (Cornell University)|Feb 27, 2018

Advanced Neural Network Applications参考文献 36被引用 76

一句话总结

论文提出 GAIN，一种端到端框架，使注意力图可训练，并通过自监督和可选额外监督来引导它们，以提升弱监督语义分割，在 VOC 2012 上达到最先进结果。

ABSTRACT

Weakly supervised learning with only coarse labels can obtain visual explanations of deep neural network such as attention maps by back-propagating gradients. These attention maps are then available as priors for tasks such as object localization and semantic segmentation. In one common framework we address three shortcomings of previous approaches in modeling such attention maps: We (1) first time make attention maps an explicit and natural component of the end-to-end training, (2) provide self-guidance directly on these maps by exploring supervision form the network itself to improve them, and (3) seamlessly bridge the gap between using weak and extra supervision if available. Despite its simplicity, experiments on the semantic segmentation task demonstrate the effectiveness of our methods. We clearly surpass the state-of-the-art on Pascal VOC 2012 val. and test set. Besides, the proposed framework provides a way not only explaining the focus of the learner but also feeding back with direct guidance towards specific tasks. Under mild assumptions our method can also be understood as a plug-in to existing weakly supervised learners to improve their generalization performance.

研究动机与目标

仅使用图像级标签进行 motivates 学习并获得用于定位与分割的可靠注意力图。
使注意力图在端到端训练中成为一个显式、可训练的组件。
提供自我引导，使注意力扩展到最具辨识性的区域之外。
使额外监督的整合成为可能，以桥接弱监督与全监督。
在弱监督下展示对 PASCAL VOC 2012 分割的最新性能。

提出的方法

具有共享参数的双流网络：一个分类流（S_cl）和一个注意力挖掘流（S_am）。
通过 Grad-CAM 类似机制使用类别分数梯度和全局平均池化权重在线生成注意力图 A^c。
从 A^c 推导出一个软掩码 I*^c 以约束 S_am 并鼓励超越最具辨识性的区域进行探索（注意力挖掘损失，L_am）。
自我引导损失 L_self = L_cl + α L_am，强制注意力覆盖对象的更多部分；α 是加权参数（在实验中 α = 1）。
GAIN ext 通过引入外部监督 L_e（如像素级掩码）进一步定制注意力图，从而得到 L_ext = L_cl + α L_am + ω L_e（实验中 ω = 10）。
在训练期间，注意力图作为弱监督分割框架（如 SEC）的先验，从而在没有完全监督的情况下实现更好的定位线索。

实验结果

研究问题

RQ1是否可以在端到端训练中将注意力图作为显式、可训练的组件用于弱监督任务？
RQ2对注意力图进行自我引导是否能够促进对最具辨识性的区域之外的对象更完整覆盖？
RQ3在注意力图上整合额外监督是否能够进一步改善性能并提高对训练数据偏差的鲁棒性？
RQ4引导注意力对弱监督下在 VOC 2012 的分割性能有何影响？
RQ5G AIN 框架是否可以作为现有弱监督学习方法的插件，以提升泛化能力？

主要发现

方法	VOC val mIoU	VOC test mIoU
GAIN (ours)	55.3%	56.8%
GAIN ext (ours)	60.5%	62.1%

GAIN 在弱监督条件下在 VOC 2012 的验证集和测试集上达到最先进的 mIoU（GAIN 的验证 55.3%，测试 56.8%）。
在具有少量像素级监督的情况下，GAIN ext 将 mIoU 提升至 60.5%（验证）和 62.1%（测试）。
在没有像素级标签的情况下，基于 GAIN 的 SEC 优于若干弱监督方法，证明了可训练注意力图的好处。
在类似设置下，在 GAIN ext 中加入像素级监督，相较于竞争方法可获得高达 4.6–4.1 百分点的性能提升。
定性结果表明 GAIN 将注意力扩展到更完整的对象区域，从而提升分割先验。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。