Skip to main content
QUICK REVIEW

[논문 리뷰] Continual Normalization: Rethinking Batch Normalization for Online Continual Learning

Quang Pham, Chenghao Liu|arXiv (Cornell University)|2022. 03. 30.
Domain Adaptation and Few-Shot Learning인용 수 22
한 줄 요약

Continual Normalization (CN) is online continual learning에서 Batch Normalization을 대체하여 미니배치와 공간 정규화를 균형 있게 조정하고, 교차-task 망각을 줄이면서 지식 전이를 유지한다.

ABSTRACT

Existing continual learning methods use Batch Normalization (BN) to facilitate training and improve generalization across tasks. However, the non-i.i.d and non-stationary nature of continual learning data, especially in the online setting, amplify the discrepancy between training and testing in BN and hinder the performance of older tasks. In this work, we study the cross-task normalization effect of BN in online continual learning where BN normalizes the testing data using moments biased towards the current task, resulting in higher catastrophic forgetting. This limitation motivates us to propose a simple yet effective method that we call Continual Normalization (CN) to facilitate training similar to BN while mitigating its negative effect. Extensive experiments on different continual learning algorithms and online scenarios show that CN is a direct replacement for BN and can provide substantial performance improvements. Our implementation is available at \url{https://github.com/phquang/Continual-Normalization}.

연구 동기 및 목표

  • Study the cross-task normalization effect of Batch Normalization in online continual learning
  • Identify desirable properties of a normalization layer for continual learning
  • Propose Continual Normalization (CN) to balance training facilitation and forgetting mitigation
  • Demonstrate CN as a drop-in BN replacement with improvements across online protocols

제안 방법

  • CN first applies Group Normalization (GN) without affine parameters to normalize spatial features
  • Then applies Batch Normalization (BN) with affine parameters to the GN output: a_CN = gamma * BN(a_GN) + beta
  • CN uses GN to incorporate spatial information and BN to retain transfer capabilities, enabling adaptive normalization without extra test-time inputs
  • CN does not introduce new learnable parameters beyond BN's gamma and beta and maintains compatibility with existing backbones
  • CN is argued to balance mini-batch and within-sample normalization to reduce cross-task normalization effects
  • Comparison is made against BN, BRN, IN, GN, and SN across online continual learning benchmarks

실험 결과

연구 질문

  • RQ1Does BN improve forward knowledge transfer but incur higher forgetting due to cross-task normalization in online CL?
  • RQ2Can CN outperform BN and other normalization layers in various online continual learning protocols?
  • RQ3Is CN a direct, test-time-adaptive replacement for BN with minimal overhead?
  • RQ4How does CN perform across task-incremental, class-incremental, and task-free online CL settings?

주요 결과

  • CN consistently achieves best overall ACC across online task-incremental experiments on Split CIFAR-100 and Split Mini IMN when compared with BN, BRN, IN, GN, and SN
  • CN balances forgetting (FM) and learning accuracy (LA), offering improved stability and transfer over BN in several settings
  • CN demonstrates improved performance over BN in online class-incremental settings under DER++ across Split CIFAR-10 and Split Tiny IMN, with higher ACC and lower FM in many configurations
  • CN shows competitive or superior results with multiple group configurations (G=8, G=32) and across memory sizes, with CN often yielding more stable results than BN
  • In long-tailed online continual learning benchmarks (COCOseq, NUS-WIDEseq), CN provides consistent improvements over BN, particularly in reducing forgetting across metrics

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.