QUICK REVIEW

[論文レビュー] Test-Time Adaptation via Conjugate Pseudo-labels

Sachin Goyal, Ming-Jie Sun|arXiv (Cornell University)|Jul 20, 2022

Domain Adaptation and Few-Shot Learning被引用数 24

ひとこと要約

本論文は、test-time adaptation (TTA) のための conjugate pseudo-labels（Conjugate PL）を導入し、訓練損失の凸共役から導かれる損失を示す。クロスエントロピーで訓練されたモデルは自然に softmax-entropy（TENT風）損失を選好する一方、二乗損失を用いたモデルは負の二乗損失を好む。経験的には Conjugate PL は複数のデータセットと損失にわたり TTA を改善し、PolyLoss を含む、そして conjugate pseudo-labels を用いた自己訓練スキームとして解釈できる。

ABSTRACT

Test-time adaptation (TTA) refers to adapting neural networks to distribution shifts, with access to only the unlabeled test samples from the new domain at test-time. Prior TTA methods optimize over unsupervised objectives such as the entropy of model predictions in TENT [Wang et al., 2021], but it is unclear what exactly makes a good TTA loss. In this paper, we start by presenting a surprising phenomenon: if we attempt to meta-learn the best possible TTA loss over a wide class of functions, then we recover a function that is remarkably similar to (a temperature-scaled version of) the softmax-entropy employed by TENT. This only holds, however, if the classifier we are adapting is trained via cross-entropy; if trained via squared loss, a different best TTA loss emerges. To explain this phenomenon, we analyze TTA through the lens of the training losses's convex conjugate. We show that under natural conditions, this (unsupervised) conjugate function can be viewed as a good local approximation to the original supervised loss and indeed, it recovers the best losses found by meta-learning. This leads to a generic recipe that can be used to find a good TTA loss for any given supervised training loss function of a general class. Empirically, our approach consistently dominates other baselines over a wide range of benchmarks. Our approach is particularly of interest when applied to classifiers trained with novel loss functions, e.g., the recently-proposed PolyLoss, where it differs substantially from (and outperforms) an entropy-based loss. Further, we show that our approach can also be interpreted as a kind of self-training using a very specific soft label, which we refer to as the conjugate pseudolabel. Overall, our method provides a broad framework for better understanding and improving test-time adaptation. Code is available at https://github.com/locuslab/tta_conjugate.

研究の動機と目的

分布シフト下でラベルなしテストデータを前提とした場合に、良い TTA 損失をどう選択または導出するかを動機づけ・理解する。
TTA 損失設計を教師付き損失の凸共役と結びつけ、エントロピー様の損失が最適となる状況や他の損失が有利となる状況を説明する。
さまざまな訓練損失（例：クロスエントロピー、二乗損失、PolyLoss）に対して良い TTA 損失を得るための実用的で一般的なレシピを提供する。
提案手法 Conjugate PL が conjugate pseudo-labels を用いた自己訓練スキームに対応し、ベンチマーク全体で経験的な向上をもたらすことを示す。

提案手法

supervised loss の凸共役を用いて TTA 損失を定式化し、L_conj(h(x)) = -f^*(∇f(h(x))) を示す。
一般的な損失に対して特化する：f(h)=log sum exp(h) を用いるクロスエントロピーでは softmax-entropy が L_conj となり、TENT に一致する。
squared loss の場合は f(h)=½||h||^2 として L_conj が負の二乗ノルムになることを示し、代替的なメタ学習済み損失を説明する。
Conjugate PL を conjugate pseudo-labels = y_hat^CPL = ∇f(h(x)) を用いた自己訓練として解釈する。
PolyLoss に拡張する場合を、標準形または展開した共役形のいずれかで表現し、非標準損失に対しても CPL を可能にする。
アルゴリズム（Conjugate PL）を提供し、ラベルなしのテストバッチ上で CPL による自己訓練でモデルパラメータを更新し、実用的な改善として温度スケーリングを適用する。

実験結果

リサーチクエスチョン

RQ1 テスト時の分布シフト下で、与えられた教師付き訓練損失に対して良い TTA 損失の principled な形は何か。
RQ2 なぜエントロピーベースの損失がクロスエントロピーで訓練されたモデルに対して良く機能し、代替損失が好ましくなるのはどんなときか。
RQ3 凸共役が多様な訓練損失（例：PolyLoss、二乗損失）に対して TTA 損失を導く普遍的な枠組みを提供できるか。
RQ4 conjugate pseudo-labeling は自己訓練とどう関連し、異なる損失に対してどのような擬似ラベルが最適か。

主な発見

メタ学習的に最適な TTA 損失を得ると、クロスエントロピーで訓練された分類器には温度付き softmax-entropy が、二乗損失で訓練された分類器には負の二乗損失が回復する傾向がある。
凸共役フレームワークは、なぜクロスエントロピーに対して softmax-entropy が現れ、他の損失が別の TTA 損失を生むのかを説明し、新しい損失の一般的レシピを提供する。
Conjugate PL は CIFAR-10/100-C および ImageNet-C、さらにはドメイン適応タスクでベースラインの TTA 損失（例：エントロピー最小化、ロバスト擬似ラベル、MEMO）を一貫して上回る。
Conjugate PL を PolyLoss や二乗損失分類器に適用すると顕著な利得が生まれ、ソース訓練損失が標準のクロスエントロピーと異なる場合の利点を示す。
本手法は conjugate pseudo-labels を用いた自己訓練として解釈でき、原理的で広く適用可能な TTA フレームワークを提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。