QUICK REVIEW

[論文レビュー] Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms

Mahdi Haghifam, Jeffrey Negrea|arXiv (Cornell University)|Apr 27, 2020

Sparse and Compressive Sensing Techniques参考文献 23被引用数 40

ひとこと要約

本論文は条件付き相互情報量（CMIkD）を情報理論的一般化測度と関連付け、CMIkD境界がIOMI境界よりも厳密であることを証明し、それらをLangevin dynamicsへ適用、最適化軌跡からデータセットのインデックスを学習する新しい一般化事前分布を提案する。

ABSTRACT

The information-theoretic framework of Russo and J. Zou (2016) and Xu and Raginsky (2017) provides bounds on the generalization error of a learning algorithm in terms of the mutual information between the algorithm's output and the training sample. In this work, we study the proposal, by Steinke and Zakynthinou (2020), to reason about the generalization error of a learning algorithm by introducing a super sample that contains the training sample as a random subset and computing mutual information conditional on the super sample. We first show that these new bounds based on the conditional mutual information are tighter than those based on the unconditional mutual information. We then introduce yet tighter bounds, building on the "individual sample" idea of Bu, S. Zou, and Veeravalli (2019) and the "data dependent" ideas of Negrea et al. (2019), using disintegrated mutual information. Finally, we apply these bounds to the study of Langevin dynamics algorithm, showing that conditioning on the super sample allows us to exploit information in the optimization trajectory to obtain tighter bounds based on hypothesis tests.

研究の動機と目的

IOMIと CMI^k_D の関係を学習シナリオ全体で特徴づける。
random index subsets との mutual information に基づく一般化境界を導出する。
ノイズのある反復アルゴリズム（Langevin dynamics）に対して、軌跡情報を取り入れた一般化事前分布を用いて境界を適用する。
従来の情報理論的境界と比較して特に後半の学習段階で境界が鋭くなることを実証的に示す。

提案手法

CMIkD(A) を定義する: CMIkD(A) = I(W; U(k) | ~Z(k)) そしてこれを IOMI_D(A) および stability の概念と関連付ける。
どのデータ分布・アルゴリズム・k に対しても CMIkD(A) ≤ IOMI_D(A) を証明し、かつ k → ∞ のとき CMIkD(A) → IOMI_D(A) を有限なパラメータ空間の下で示す。
先行研究の random index および supersample 構造を用いた新しい一般化境界を2つ確立する。
Langevin dynamics のための一般化事前分布と事後分布を構築し、最適化軌跡から index 値を学習させる。
従来の境界よりも厳密な境界を示す経験的比較を提供し、特に後半の学習段階で効果を示す。

実験結果

リサーチクエスチョン

RQ1CMIkD(A) は learning scenarios および k に対して IOMI_D(A) とどのように比較されるか？
RQ2random index subsets との mutual information で generalization error を結ぶ一般化境界を導出できるか？
RQ3これらの境界は trajectory-informed prior を用いた Langevin dynamics のようなノイズのある反復アルゴリズムに効果的に適用できるか？
RQ4過適合シナリオで長時間のトレーニング実行において境界が非自明で非発散のままであるか？

主な発見

CMIkD(A) はどんなデータ分布・アルゴリズム・ k に対しても IOMI_D(A) より小さい。
CMIkD(A) は k が大きくなると IOMI_D(A) に収束する、有限パラメータ空間の下で。
random index subsets および supersamples との相互情報に一般化を結ぶ2つの新規境界が、CMIkD(A)ベースの境界よりも厳密。
Langevin dynamics のための最適化軌跡からデータセットインデックスを学習する一般化事前分布を導入。
実証結果は新境界が既存境界を上回り、特に後半の学習段階と強い過学習で効果を発揮。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。