[論文レビュー] SaltiNet: Scan-path Prediction on 360 Degree Images using Saliency Volumes
SaltiNet は 360° 画像の時系列を意識した注視ボリュームを予測し、それらから scan-paths をサンプリングする CNN で、Salient360! 2017 チャレンジでトップパフォーマンスを達成。
We introduce SaltiNet, a deep neural network for scanpath prediction trained on 360-degree images. The model is based on a temporal-aware novel representation of saliency information named the saliency volume. The first part of the network consists of a model trained to generate saliency volumes, whose parameters are fit by back-propagation computed from a binary cross entropy (BCE) loss over downsampled versions of the saliency volumes. Sampling strategies over these volumes are used to generate scanpaths over the 360-degree images. Our experiments show the advantages of using saliency volumes, and how they can be used for related tasks. Our source code and trained models available at https://github.com/massens/saliency-360salient-2017.
研究の動機と目的
- Introduce saliency volumes to capture the temporal nature of eye-gaze in 360° images.
- Propose SaltiNet to generate scan-paths from predicted saliency volumes.
- Show that saliency volumes enable effective scanpath sampling and related tasks.
- Demonstrate state-of-the-art performance on the Salient360! 2017 benchmark.
提案手法
- Predict saliency volumes with a CNN architecture initialized from VGG-16 and trained with BCE loss over downsampled volumes.
- Construct saliency volumes by quantizing fixation timestamps, creating a binary fixation volume, and convolving with a multivariate Gaussian kernel.
- Output a 12×300×600 saliency volume representing time, height, and width for training and sampling.
- Train using transfer learning from saliency map models (SALICON) and volume prediction on iSUN, then fine-tune on a head/eye movement dataset captured in VR (Oculus DK2).
- Sample scan-paths from saliency volumes by drawing fixations per temporal slice according to learned distributions and by using spatial sampling strategies; best results come from constraining fixation movement between steps.
- Evaluate using a variant of the Jarodzka similarity measure adapted to 360° (equirectangular) and Hungarian matching.
実験結果
リサーチクエスチョン
- RQ1Can temporally-aware saliency volumes improve scan-path prediction for 360° images?
- RQ2What sampling strategy over saliency volumes yields realistic scanpaths?
- RQ3How does SaltiNet perform compared to other Salient360! entrants?
- RQ4What are the limitations of sampling-based scanpath generation from volumes and how can they be mitigated?
主な発見
- SaltiNet with the distance-limiting sampling strategy (2) achieves the best 1–0–1 scoring among sampling strategies (Jarodzka score 2.27, lower is better).
- Compared to random or naive sampling, SaltiNet-based sampling substantially improves scanpath realism (random 4.94; naive 3.45; distance-limited 2.27).
- Sampling ground truth saliency maps/volumes yields even better alignment (1.89 and 1.79, respectively).
- Ground-truth scan-paths are much lower (1.2e-8 in the reported metric), indicating a high gap between generated and true paths, while SaltiNet submissions outperform two other Salient360! entrants (e.g., SJTU 4.6565, Wuhan University 5.9517).
- Training the model takes about two hours on a NVIDIA GTX Titan X using Keras/Theano, with 2 hours reported for convergence.
- SaltiNet won the best scanpath solution at the Salient360! challenge in ICME 2017.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。