QUICK REVIEW

[論文レビュー] Compressing GANs using Knowledge Distillation

Angeline Aguinaldo, Ping-yeh Chiang|arXiv (Cornell University)|Feb 1, 2019

Generative Adversarial Networks and Image Synthesis参考文献 19被引用数 61

ひとこと要約

本論文は、知識蒸馏を用いて過剰パラメータ化されたGANを圧縮することを示し、MNIST、CIFAR-10、および Celeb-A で、ゼロから訓練した同等サイズのGANとほぼ同等またはそれを上回る小さな学生GANを生み出し、著しい圧縮比を達成する。

ABSTRACT

Generative Adversarial Networks (GANs) have been used in several machine learning tasks such as domain transfer, super resolution, and synthetic data generation. State-of-the-art GANs often use tens of millions of parameters, making them expensive to deploy for applications in low SWAP (size, weight, and power) hardware, such as mobile devices, and for applications with real time capabilities. There has been no work found to reduce the number of parameters used in GANs. Therefore, we propose a method to compress GANs using knowledge distillation techniques, in which a smaller "student" GAN learns to mimic a larger "teacher" GAN. We show that the distillation methods used on MNIST, CIFAR-10, and Celeb-A datasets can compress teacher GANs at ratios of 1669:1, 58:1, and 87:1, respectively, while retaining the quality of the generated image. From our experiments, we observe a qualitative limit for GAN's compression. Moreover, we observe that, with a fixed parameter budget, compressed GANs outperform GANs trained using standard training methods. We conjecture that this is partially owing to the optimization landscape of over-parameterized GANs which allows efficient training using alternating gradient descent. Thus, training an over-parameterized GAN followed by our proposed compression scheme provides a high quality generative model with a small number of parameters.

研究の動機と目的

小型・軽量ハードウェアとリアルタイムアプリケーションにおける大規模GANの計算負荷を動機付け、対処する。
GANに特化した知識蒸馏を導入し、画像品質を維持しつつ生成器ネットワークを圧縮する。
ISとFIDを品質指標として用い、MNIST、CIFAR-10、Celeb-Aで圧縮を経験的に評価する。
GAN圧縮の限界と、過剰パラメータ化が成功する蒸留下の役割を分析する。

提案手法

大規模で過剰パラメータ化されたGAN（教師）が、より小さなGAN（学生）を導く教師-学生フレームワークを使用する。
学生のための二つの訓練方式を採用する： (i) 教師の出力とのピクセル単位距離を最小化するMSE損失； (ii) GAN目的とMSE項を結合して学生の出力を教師と整合させる共同損失。
さまざまなサイズを訓練して教師ネットワークを選択し、Inception ScoreとFIDで最良を選ぶ。
深さスケールファクターdを介してモデルサイズを制御し、教師サイズと対応するパラメータ数を探索する。
圧縮をInception Score、Frechet Inception Distance、およびぼかしのためにラプラシアンの分散で評価する。

実験結果

リサーチクエスチョン

RQ1はるかに少ないパラメータ数の学生GANは、潜在空間全体で教師GANの生成関数を再現できるか？
RQ2MNIST、CIFAR-10、Celeb-Aで、画像品質の重大な低下を伴わずに達成可能な圧縮比はどれか？
RQ3知識蒸留は、IS、FID、シャープネスの点で、同様のサイズのGANをゼロから訓練することより有利ですか？
RQ4異なる複雑さのデータセットに対するGAN圧縮の視覚的・定量的限界は何ですか？
RQ5共同のGAN+MSE損失は、圧縮品質においてMSEのみとどう比較され、特に画像のシャープネスに関してどうか？

主な発見

GANサイズ (d)	パラメータ数	MNIST - 比率	MNIST IS (Stu.)	MNIST IS (Reg.)	CIFAR-10 - 比率	CIFAR-10 FID (Stu.)	CIFAR-10 FID (Reg.)	Celeb-A - 比率	Celeb-A FID (Stu.)	Celeb-A FID (Reg.)
2	28,351	1669:1	5.80	1.86	126:1	11.76	38.72	446:1	12.15	45.49
4	62,077	762:1	6.41	3.63	58:1	11.00	14.28	204:1	10.97	18.72
8	145,657	325:1	6.60	4.73	25:1	9.57	11.85	87:1	8.78	11.06
16	377,329	125:1	6.83	5.07	9:1	8.39	9.90	34:1	6.29	9.14
32	1,098,721	43:1	6.87	6.08	3:1	7.80	7.86	12:1	4.84	5.05
48	2,164,177	—	—	—	2:1	7.58	—	6:1	4.54	—
64	3,573,697	—	6.93	6.51	—	—	—	—	—	—
128	12,652,417	4:1	6.97	6.63	—	—	—	—	—	—

学生GANは、同じ小サイズの通常のGANを、すべてのデータセットで一貫して上回る。
MNISTでは、圧縮が1,669:1に達し、教師のInception Scoreの83%を保持。
CIFAR-10とCeleb-Aでは、圧縮が顕著な比率（それぞれ58:1と87:1）を達成し、FIDスコアは競合的。
圧縮された学生は潜在空間全体で教師の生成関数を近似し、記憶ではなく知識移転を示唆している。
共同損失はFIDをわずかに改善し、MSEのみの訓練よりも著しくシャープな画像（VoLが高い）を生み出すが、より複雑なデータで高圧縮時にはいくらかブラーが残る。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。