QUICK REVIEW

[论文解读] Towards Stable Test-Time Adaptation in Dynamic Wild World

Shuaicheng Niu, Jiaxiang Wu|arXiv (Cornell University)|Feb 24, 2023

Domain Adaptation and Few-Shot Learning被引用 62

一句话总结

本文提出 SAR，一种锐度感知且可靠的熵最小化方法，在野生测试条件下通过过滤噪声样本并促进平坦极小值，以稳定完全的测试时自适应（TTA）过程。

ABSTRACT

Test-time adaptation (TTA) has shown to be effective at tackling distribution shifts between training and testing data by adapting a given model on test samples. However, the online model updating of TTA may be unstable and this is often a key obstacle preventing existing TTA methods from being deployed in the real world. Specifically, TTA may fail to improve or even harm the model performance when test data have: 1) mixed distribution shifts, 2) small batch sizes, and 3) online imbalanced label distribution shifts, which are quite common in practice. In this paper, we investigate the unstable reasons and find that the batch norm layer is a crucial factor hindering TTA stability. Conversely, TTA can perform more stably with batch-agnostic norm layers, \ie, group or layer norm. However, we observe that TTA with group and layer norms does not always succeed and still suffers many failure cases. By digging into the failure cases, we find that certain noisy test samples with large gradients may disturb the model adaption and result in collapsed trivial solutions, \ie, assigning the same class label for all samples. To address the above collapse issue, we propose a sharpness-aware and reliable entropy minimization method, called SAR, for further stabilizing TTA from two aspects: 1) remove partial noisy samples with large gradients, 2) encourage model weights to go to a flat minimum so that the model is robust to the remaining noisy samples. Promising results demonstrate that SAR performs more stably over prior methods and is computationally efficient under the above wild test scenarios.

研究动机与目标

在现实的野生测试设置下，动机并分析测试时自适应（TTA）的稳定性挑战。
识别批量归一化作为稳定 TTA 的关键障碍并评估与批量无关的规范（GN/LN）。
开发一个鲁棒优化框架（SAR），通过过滤噪声样本并强制平坦极小值来实现可靠的在线自适应。
在 ImageNet-C 上，通过多种野生场景，实证验证归一化层的影响并将 SAR 与最先进的 TTA 方法进行比较。

提出的方法

论证 BN 在小批量与分布漂移下的偏置均值/方差估计导致稳定 TTA 受阻。
研究归一化层并提出在 TTA 中使用批量无关的规范（GN/LN）。
通过有选择地使用熵 E(x;Θ) 低于阈值 E0 的样本，提出可靠的熵最小化。
将锐度感知熵最小化 E^SA 定义为在半径 ρ 内对熵的最大扰动，以鼓励平坦极小值。
使用双层优化，最小化 S(x)E^SA(x;Θ)，其中 S(x) 根据熵选择可靠样本，并在探测到崩溃时使用模型恢复方案进行重置。
将参数更新限定在 GN/LN 层的仿射参数上，遵循 Tent/EATA 的设置以提高效率。
在 ImageNet-C 的混合移位、小批量和在线标签不平衡条件下，将 SAR 与 MEMO、DDA、Tent 和 EATA 进行对比。

实验结果

研究问题

RQ1归一化选择（BN、GN、LN）如何影响野生测试条件下在线 TTA 的稳定性？
RQ2单一、有效的在线优化是否能够克服以 GN/LN 为基础的 TTA 的模型崩溃与不稳定？
RQ3有选择地过滤高梯度/嘈杂样本并促进平坦极小值是否提升了基于熵的 TTA 方法的鲁棒性？
RQ4在混合移位、小批量和在线不平衡标签漂移条件下，SAR 相对于最先进的 TTA 方法表现如何？

主要发现

批量无关的范畴层 GN/LN 在野生测试场景下比 BN 更稳定，但仍存在失败情况。
通过基于熵去除高梯度/嘈杂样本并在自适应过程中强制锐度感知（平坦）极小值，SAR 提高了稳定性。
在 ImageNet-C 混合场景中，使用 GN 和 LN 的 SAR 与 MEMO、DDA、Tent、EATA 相比，达到具有竞争力或更优的准确率，且在在线不平衡标签漂移条件下也如此。
在严重度等级为 5 和 3 的混合污染下，SAR 在 GN 和 LN 模型中获得了评估方法中的最佳平均准确率。
当批量大小为 1 时，SAR 往往在多种污染类型和模型上取得最好结果，而 MEMO 与 DDA 的计算成本较高。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。