QUICK REVIEW

[論文レビュー] SaltiNet: Scan-path Prediction on 360 Degree Images using Saliency Volumes

Marc Assens, Kevin McGuinness|arXiv (Cornell University)|Jul 11, 2017

Visual Attention and Saliency Detection参考文献 34被引用数 106

ひとこと要約

SaltiNet は 360° 画像の時系列を意識した注視ボリュームを予測し、それらから scan-paths をサンプリングする CNN で、Salient360! 2017 チャレンジでトップパフォーマンスを達成。

ABSTRACT

We introduce SaltiNet, a deep neural network for scanpath prediction trained on 360-degree images. The model is based on a temporal-aware novel representation of saliency information named the saliency volume. The first part of the network consists of a model trained to generate saliency volumes, whose parameters are fit by back-propagation computed from a binary cross entropy (BCE) loss over downsampled versions of the saliency volumes. Sampling strategies over these volumes are used to generate scanpaths over the 360-degree images. Our experiments show the advantages of using saliency volumes, and how they can be used for related tasks. Our source code and trained models available at https://github.com/massens/saliency-360salient-2017.

研究の動機と目的

Introduce saliency volumes to capture the temporal nature of eye-gaze in 360° images.
Propose SaltiNet to generate scan-paths from predicted saliency volumes.
Show that saliency volumes enable effective scanpath sampling and related tasks.
Demonstrate state-of-the-art performance on the Salient360! 2017 benchmark.

提案手法

Predict saliency volumes with a CNN architecture initialized from VGG-16 and trained with BCE loss over downsampled volumes.
Construct saliency volumes by quantizing fixation timestamps, creating a binary fixation volume, and convolving with a multivariate Gaussian kernel.
Output a 12×300×600 saliency volume representing time, height, and width for training and sampling.
Train using transfer learning from saliency map models (SALICON) and volume prediction on iSUN, then fine-tune on a head/eye movement dataset captured in VR (Oculus DK2).
Sample scan-paths from saliency volumes by drawing fixations per temporal slice according to learned distributions and by using spatial sampling strategies; best results come from constraining fixation movement between steps.
Evaluate using a variant of the Jarodzka similarity measure adapted to 360° (equirectangular) and Hungarian matching.

実験結果

リサーチクエスチョン

RQ1Can temporally-aware saliency volumes improve scan-path prediction for 360° images?
RQ2What sampling strategy over saliency volumes yields realistic scanpaths?
RQ3How does SaltiNet perform compared to other Salient360! entrants?
RQ4What are the limitations of sampling-based scanpath generation from volumes and how can they be mitigated?

主な発見

SaltiNet with the distance-limiting sampling strategy (2) achieves the best 1–0–1 scoring among sampling strategies (Jarodzka score 2.27, lower is better).
Compared to random or naive sampling, SaltiNet-based sampling substantially improves scanpath realism (random 4.94; naive 3.45; distance-limited 2.27).
Sampling ground truth saliency maps/volumes yields even better alignment (1.89 and 1.79, respectively).
Ground-truth scan-paths are much lower (1.2e-8 in the reported metric), indicating a high gap between generated and true paths, while SaltiNet submissions outperform two other Salient360! entrants (e.g., SJTU 4.6565, Wuhan University 5.9517).
Training the model takes about two hours on a NVIDIA GTX Titan X using Keras/Theano, with 2 hours reported for convergence.
SaltiNet won the best scanpath solution at the Salient360! challenge in ICME 2017.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。