QUICK REVIEW

[論文レビュー] Dataset Distillation via Factorization

Songhua Liu, Kai Wang|arXiv (Cornell University)|Oct 30, 2022

Image Processing Techniques and Applications被引用数 59

ひとこと要約

HaBa は dataset distillation のための hallucinator-basis factorization を導入し、より少ないパラメータで表現力のある合成データを実現し、下流のパフォーマンスを向上させ、クロスアーキテクチャの利得を含む。敵対的コントラスト制約を追加して多様性と情報量を高め、既存の DD ベースラインとプラグアンドプレイで利用可能。

ABSTRACT

In this paper, we study \xw{dataset distillation (DD)}, from a novel perspective and introduce a \emph{dataset factorization} approach, termed \emph{HaBa}, which is a plug-and-play strategy portable to any existing DD baseline. Unlike conventional DD approaches that aim to produce distilled and representative samples, \emph{HaBa} explores decomposing a dataset into two components: data \emph{Ha}llucination networks and \emph{Ba}ses, where the latter is fed into the former to reconstruct image samples. The flexible combinations between bases and hallucination networks, therefore, equip the distilled data with exponential informativeness gain, which largely increase the representation capability of distilled datasets. To furthermore increase the data efficiency of compression results, we further introduce a pair of adversarial contrastive constraints on the resultant hallucination networks and bases, which increase the diversity of generated images and inject more discriminant information into the factorization. Extensive comparisons and experiments demonstrate that our method can yield significant improvement on downstream classification tasks compared with previous state of the arts, while reducing the total number of compressed parameters by up to 65\%. Moreover, distilled datasets by our approach also achieve extasciitilde10\% higher accuracy than baseline methods in cross-architecture generalization. Our code is available \href{https://github.com/Huage001/DatasetFactorization}{here}.

研究の動機と目的

データセット蒸留（DD）におけるデータ/ストレージ効率を動機づけ、解決する。
合成データをベースとホラシネーターに分解する因子化を提案し、情報量を増やす。
生成データの多様性を高め、冗長性を減らすために敵対的コントラスト制約を導入する。
既存の DD ベースラインとのプラグイン互換性を示し、性能の改善を示す。

提案手法

合成データを、S = {H_theta_j} ∪ {(x_hat_i, y_hat_i)} として一連のベース B とホラシネーター H のセットへ因子化する。
各ホラシネーターはベースを入力として受け取り、アフィンスケーリングとシフトを用いた encoder-transformer-decoder パイプラインにより幻像画像を出力する。
敵対的コントラスト損失 L_cos と、（オプションで監督付きの）コントラスト損失 L_con を導入して、同じベースを共有するサンプル間の多様性を最大化し冗長性を減らす。
タスク損失 L_task と DD 目的関数 L_DD を組み込み、エンドツーエンドの差分可能なパイプラインで交互に学習する。HaBa は既存の DD 目的関数へのプラグインとして互換性がある。
オプションとして concurrent efficient data parameterization（IDC）と組み合わせ、クロスアーキテクチャの一般化を評価する。

実験結果

リサーチクエスチョン

RQ1HaBa は同じストレージ予算の下で、最先端の DD ベースラインと比較して下流モデルの性能を向上させることができるか。
RQ2ベースとホラシネーターへの因子化は、ストレージを増やさずにデータの多様性と情報量を増加させるか。
RQ3 HaBa はクロスアーキテクチャ一般化（あるアーキテクチャで訓練し、他のアーキテクチャで評価）にどのような影響を与えるか。
RQ4敵対的コントラスト制約が性能と多様性に与える影響は何か。

主な発見

HaBa は SVHN、CIFAR10、CIFAR100 のベンチマーク全体で従来の DD 手法に対して有意な改善を示す。
同じストレージ予算の下で、HaBa は総合的な圧縮パラメータを最大で 65% 減らす。
HaBa はクロスアーキテクチャ一般化シナリオで、ベースライン手法より約 10% 高い精度を達成する。
ベースはコア構造を、ホラシネーターは多様なスタイルをレンダリングし、追加のストレージ無しでデータの多様性を向上させる。
HaBa は複数の DD ベースライン（DC、DM、MTT）の上に構築したときに一貫した利得を示し、さまざまなネットワーク（ConvNet、ResNet、VGG、AlexNet）とのクロスアーキテクチャ利得をサポートする。
定性的な可視化は、異なるホラシネーターが共有ベースから多様な画像を生成し、データセットの情報量を高めることを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。