Skip to main content
QUICK REVIEW

[论文解读] SaltiNet: Scan-path Prediction on 360 Degree Images using Saliency Volumes

Marc Assens, Kevin McGuinness|arXiv (Cornell University)|Jul 11, 2017
Visual Attention and Saliency Detection参考文献 34被引用 106
一句话总结

SaltiNet 是一个 CNN,能够预测用于 360° 图像的时间感知显著性体积并从中采样扫描路径,在 Salient360! 2017 挑战中达到顶尖表现。

ABSTRACT

We introduce SaltiNet, a deep neural network for scanpath prediction trained on 360-degree images. The model is based on a temporal-aware novel representation of saliency information named the saliency volume. The first part of the network consists of a model trained to generate saliency volumes, whose parameters are fit by back-propagation computed from a binary cross entropy (BCE) loss over downsampled versions of the saliency volumes. Sampling strategies over these volumes are used to generate scanpaths over the 360-degree images. Our experiments show the advantages of using saliency volumes, and how they can be used for related tasks. Our source code and trained models available at https://github.com/massens/saliency-360salient-2017.

研究动机与目标

  • 引入显著性体积以捕捉 360° 图像中眼动的时间特性。
  • 提出 SaltiNet,通过预测的显著性体积生成扫描路径。
  • 显示显著性体积能实现有效的 scanpath 采样及相关任务。
  • 展示在 Salient360! 2017 基准测试中的最新性能。

提出的方法

  • 使用从 VGG-16 初始化的 CNN 架构预测显著性体积,并对下采样体积使用 BCE 损失训练。
  • 通过量化注视时间戳、创建二值注视体积,并用多变量高斯核进行卷积来构建显著性体积。
  • 输出一个 12×300×600 的显著性体积,表示训练和采样的时间、高度、宽度。
  • 使用从显著性图模型(SALICON)和 iSUN 的体积预测的迁移学习进行训练,然后在 VR(Oculus DK2)中捕捉的头部/眼动数据集上微调。
  • 从显著性体积采样扫描路径,通过按学习到的分布在每个时间切片绘制注视点,并使用空间采样策略;最好的结果来自限制步之间的注视点移动。
  • 使用适用于 360°(等矩投影)的 Jarodzka 相似性度量变体和匈牙利匹配进行评估。

实验结果

研究问题

  • RQ1Can temporally-aware saliency volumes improve scan-path prediction for 360° images?
  • RQ2What sampling strategy over saliency volumes yields realistic scanpaths?
  • RQ3How does SaltiNet perform compared to other Salient360! entrants?
  • RQ4What are the limitations of sampling-based scanpath generation from volumes and how can they be mitigated?

主要发现

  • SaltiNet with the distance-limiting sampling strategy (2) achieves the best 1–0–1 scoring among sampling strategies (Jarodzka score 2.27, lower is better).
  • Compared to random or naive sampling, SaltiNet-based sampling substantially improves scanpath realism (random 4.94; naive 3.45; distance-limited 2.27).
  • Sampling ground truth saliency maps/volumes yields even better alignment (1.89 and 1.79, respectively).
  • Ground-truth scan-paths are much lower (1.2e-8 in the reported metric), indicating a high gap between generated and true paths, while SaltiNet submissions outperform two other Salient360! entrants (e.g., SJTU 4.6565, Wuhan University 5.9517).
  • Training the model takes about two hours on a NVIDIA GTX Titan X using Keras/Theano, with 2 hours reported for convergence.
  • SaltiNet won the best scanpath solution at the Salient360! challenge in ICME 2017.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。