QUICK REVIEW

[论文解读] DeepGaze II: Reading fixations from deep features trained on object recognition

Matthias Kümmerer, Thomas S. A. Wallis|arXiv (Cornell University)|Oct 5, 2016

Advanced Image and Video Retrieval Techniques参考文献 5被引用 260

一句话总结

DeepGaze II 通过对固定的 VGG-19 特征应用非线性读出层来预测显著性，基于在 SALICON 上预训练的特征，达到最先进的信息增益和 MIT300 的 AUC/sAUC 表现而不重新训练基础网络。

ABSTRACT

Here we present DeepGaze II, a model that predicts where people look in images. The model uses the features from the VGG-19 deep neural network trained to identify objects in images. Contrary to other saliency models that use deep features, here we use the VGG features for saliency prediction with no additional fine-tuning (rather, a few readout layers are trained on top of the VGG features to predict saliency). The model is therefore a strong test of transfer learning. After conservative cross-validation, DeepGaze II explains about 87% of the explainable information gain in the patterns of fixations and achieves top performance in area under the curve metrics on the MIT300 hold-out benchmark. These results corroborate the finding from DeepGaze I (which explained 56% of the explainable information gain), that deep features trained on object recognition provide a versatile feature space for performing related visual tasks. We explore the factors that contribute to this success and present several informative image examples. A web service is available to compute model predictions at http://deepgaze.bethgelab.org.

研究动机与目标

证明来自对象识别的固定深度特征能够在不微调的情况下作为强大的显著性空间。
量化该模型在基准数据集上的信息理论性能（可解释的信息增益）。
在 MIT300 上评估性能并与先前的显著性模型进行比较。
展示预训练和特征类型对显著性预测的影响。
明确建模中心偏置并评估其对预测的影响。

提出的方法

使用 VGG-19 特征 (conv5_1，relu5_1，relu5_2，conv5_3，relu5_4) 映射到统一分辨率。
在固定的 VGG 特征之上训练一个四层 1x1 卷积读出网络以产生显著性分数 O(x,y)。
将 O(x,y) 与高斯卷积再加上中心偏置先验，然后应用 softmax 以获得概率图 p(x,y)。
使用最大似然（对数似然）在概率框架下进行训练，并以信息增益作为评估指标。
在 SALICON 上对读出进行预训练，然后在 MIT1003 上执行基于图像的交叉验证进行微调；在 MIT300 的保留集上评估。
使用 1x1 卷积将学习的特征限制在对 VGG 特征的逐点非线性上。

实验结果

研究问题

RQ1来自对象识别的固定深度特征（VGG-19）是否能够在不重新训练特征提取器的情况下提供一个强大且可迁移的显著性预测空间？
RQ2使用预训练的深层特征和可学习的读出层，显著性模型能达到多少可解释的信息增益？
RQ3预训练（SALICON）和特征选择（VGG 与 AlexNet）对显著性性能的贡献是多少？
RQ4所提出的带中心偏置的概率读出在 MIT300 的基准显著性指标上表现如何？
RQ5将 DeepGaze II 的预测与金标准注视进行比较能带来哪些定性洞见？

主要发现

DeepGaze II 在 MIT1003 子集上解释性信息增益的 87%，相较先前的 DeepGaze I（56%）有显著提升。
在 MIT300 上，DeepGaze II 在 MIT Saliency Benchmark 中达到最高 AUC 和置换 AUC（包含中心偏置时为 88% AUC，77% sAUC）。
DeepGaze II 达到近金标准的性能，在评估子集中没有图像的预测比基线中心偏置差。
使用预训练的 VGG 特征和 SALICON 预训练是相对于 DeepGaze I 性能提升的最大贡献者。
该模型在无需重新训练 VGG 特征的情况下保持强劲性能，依赖于一个小型的 1x1 读出和概率形式。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。