QUICK REVIEW

[论文解读] SaltiNet: Scan-path Prediction on 360 Degree Images using Saliency Volumes

Marc Assens, Kevin McGuinness|arXiv (Cornell University)|Jul 11, 2017

Visual Attention and Saliency Detection参考文献 34被引用 106

一句话总结

SaltiNet 是一个 CNN，能够预测用于 360° 图像的时间感知显著性体积并从中采样扫描路径，在 Salient360! 2017 挑战中达到顶尖表现。

ABSTRACT

We introduce SaltiNet, a deep neural network for scanpath prediction trained on 360-degree images. The model is based on a temporal-aware novel representation of saliency information named the saliency volume. The first part of the network consists of a model trained to generate saliency volumes, whose parameters are fit by back-propagation computed from a binary cross entropy (BCE) loss over downsampled versions of the saliency volumes. Sampling strategies over these volumes are used to generate scanpaths over the 360-degree images. Our experiments show the advantages of using saliency volumes, and how they can be used for related tasks. Our source code and trained models available at https://github.com/massens/saliency-360salient-2017.

研究动机与目标

引入显著性体积以捕捉 360° 图像中眼动的时间特性。
提出 SaltiNet，通过预测的显著性体积生成扫描路径。
显示显著性体积能实现有效的 scanpath 采样及相关任务。
展示在 Salient360! 2017 基准测试中的最新性能。

提出的方法

使用从 VGG-16 初始化的 CNN 架构预测显著性体积，并对下采样体积使用 BCE 损失训练。
通过量化注视时间戳、创建二值注视体积，并用多变量高斯核进行卷积来构建显著性体积。
输出一个 12×300×600 的显著性体积，表示训练和采样的时间、高度、宽度。
使用从显著性图模型（SALICON）和 iSUN 的体积预测的迁移学习进行训练，然后在 VR（Oculus DK2）中捕捉的头部/眼动数据集上微调。
从显著性体积采样扫描路径，通过按学习到的分布在每个时间切片绘制注视点，并使用空间采样策略；最好的结果来自限制步之间的注视点移动。
使用适用于 360°（等矩投影）的 Jarodzka 相似性度量变体和匈牙利匹配进行评估。

实验结果

研究问题

RQ1Can temporally-aware saliency volumes improve scan-path prediction for 360° images?
RQ2What sampling strategy over saliency volumes yields realistic scanpaths?
RQ3How does SaltiNet perform compared to other Salient360! entrants?
RQ4What are the limitations of sampling-based scanpath generation from volumes and how can they be mitigated?

主要发现

SaltiNet with the distance-limiting sampling strategy (2) achieves the best 1–0–1 scoring among sampling strategies (Jarodzka score 2.27, lower is better).
Compared to random or naive sampling, SaltiNet-based sampling substantially improves scanpath realism (random 4.94; naive 3.45; distance-limited 2.27).
Sampling ground truth saliency maps/volumes yields even better alignment (1.89 and 1.79, respectively).
Ground-truth scan-paths are much lower (1.2e-8 in the reported metric), indicating a high gap between generated and true paths, while SaltiNet submissions outperform two other Salient360! entrants (e.g., SJTU 4.6565, Wuhan University 5.9517).
Training the model takes about two hours on a NVIDIA GTX Titan X using Keras/Theano, with 2 hours reported for convergence.
SaltiNet won the best scanpath solution at the Salient360! challenge in ICME 2017.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。