QUICK REVIEW

[論文レビュー] Towards High Resolution Video Generation with Progressive Growing of Sliced Wasserstein GANs

U. Dinesh Acharya, Zhiwu Huang|arXiv (Cornell University)|Oct 4, 2018

Generative Adversarial Networks and Image Synthesis参考文献 2被引用数 45

ひとこと要約

この研究は、progressive growing GANsを高解像度のビデオ生成へ拡張し、時空間層を段階的に追加し、訓練を安定化させるためにSliced Wasserstein GAN lossを使用することで、新しい256x256x32のフェイスダイナミクスビデオデータセットで実証されました。

ABSTRACT

The extension of image generation to video generation turns out to be a very difficult task, since the temporal dimension of videos introduces an extra challenge during the generation process. Besides, due to the limitation of memory and training stability, the generation becomes increasingly challenging with the increase of the resolution/duration of videos. In this work, we exploit the idea of progressive growing of Generative Adversarial Networks (GANs) for higher resolution video generation. In particular, we begin to produce video samples of low-resolution and short-duration, and then progressively increase both resolution and duration alone (or jointly) by adding new spatiotemporal convolutional layers to the current networks. Starting from the learning on a very raw-level spatial appearance and temporal movement of the video distribution, the proposed progressive method learns spatiotemporal information incrementally to generate higher resolution videos. Furthermore, we introduce a sliced version of Wasserstein GAN (SWGAN) loss to improve the distribution learning on the video data of high-dimension and mixed-spatiotemporal distribution. SWGAN loss replaces the distance between joint distributions by that of one-dimensional marginal distributions, making the loss easier to compute. We evaluate the proposed model on our collected face video dataset of 10,900 videos to generate photorealistic face videos of 256x256x32 resolution. In addition, our model also reaches a record inception score of 14.57 in unsupervised action recognition dataset UCF-101.

研究の動機と目的

高解像度ビデオを生成する際の不安定さとメモリの課題に対処する。
videoの解像度と長さを段階的に増加させるprogressive growingフレームワークを提案する。
高次元のビデオデータにおける分布学習を安定化させるために、Sliced Wasserstein GAN (SWGAN) lossを導入する。
トレーニングと評価のための大規模な顔ダイナミクスビデオデータセット（TrailerFaces、約10.9kクリップ）を作成する。
外観とダイナミクスの両方で、既存のVideo GANに対する改善を示し、InceptionスコアおよびFID指標で競争力を示す。

提案手法

Extend Progressive Growing of GANsをビデオ生成の時空間領域へ拡張する。
新しい層を介して解像度と持続時間を徐々に追加するために、3D畳み込みと遷移フェーズを使用する。
訓練を安定化させるためにMinibatch Standard DeviationとPixel Normalizationを組み込む。
SWGAN lossを採用して、1-D投影によるWDの近似を通じて高次元分布学習を安定化する。
TrailerFacesデータセットを、顔ダイナミクスのビデオクリップ10,910本（または10,900本）で構築・活用する。
UCF-101およびwildデータセットを対象にInception Score (IS)およびFrechet Inception Distance (FID)で評価する。

実験結果

リサーチクエスチョン

RQ1progressive growing of GANsを高解像度で長いビデオ列へ効果的に拡張できるか。
RQ2Sliced Wasserstein GAN lossは高次元ビデオ生成の安定性と品質を改善するか。
RQ3どのデータセットと評価指標が、外観とダイナミクスの両方の改善を最も示すか。
RQ4提案手法は標準およびwildデータセットで既存のVideo GAN（VideoGAN、Temporal GAN など）と比較してどうか。

主な発見

この手法は、256x256x32までのビデオ生成を可能にし、以前報告された(64x64x32)より大きい。
時空間層を含むProgressive Growing戦略は、従来の手法より外観とダイナミクスの改善をもたらす。
SWGAN lossは高次元ビデオ分布の安定した学習を促進し、progressive frameworkと統合可能。
この手法は、UCF-101アクション認識データセット（教師なし）でイネットペクションスコア14.57の新記録を達成。
本手法は、2つの難易度の高いwildデータセットで最先端手法よりFIDスコアが優れている。
高解像度ビデオ生成研究を支援するため、TrailerFacesデータセット（顔ダイナミクスのビデオクリップ10,910本）を新規に導入。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。