[Paper Review] Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models
Batch Renormalization extends Batch Normalization to reduce minibatch dependence, enabling stable training with small or non-i.i.d. minibatches while preserving training efficiency and other BN benefits. It introduces a per-dimension affine correction (r, d) computed from minibatches but treated as constants during backpropagation, gradually relaxed during training.
Batch Normalization is quite effective at accelerating and improving the training of deep models. However, its effectiveness diminishes when the training minibatches are small, or do not consist of independent samples. We hypothesize that this is due to the dependence of model layer inputs on all the examples in the minibatch, and different activations being produced between training and inference. We propose Batch Renormalization, a simple and effective extension to ensure that the training and inference models generate the same outputs that depend on individual examples rather than the entire minibatch. Models trained with Batch Renormalization perform substantially better than batchnorm when training with small or non-i.i.d. minibatches. At the same time, Batch Renormalization retains the benefits of batchnorm such as insensitivity to initialization and training efficiency.
Motivation & Objective
- Motivate and address the drawbacks of Batch Normalization when minibatches are small or non-i.i.d.
- Develop an extension that makes training activations depend on individual examples similar to inference
- Maintain BN advantages (training speed, initialization robustness) while aligning training and inference activations
- Provide a practical, easy-to-implement method with tunable correction bounds and moving-average updates
Proposed method
- Introduce per-dimension correction factors r and d to Batch Normalization activations, treated as constants during gradient computation
- Compute r and d from minibatch statistics but clamp them with r_max and d_max and apply stop_gradient to their values
- Use moving averages mu and sigma during training for correction, with a higher update rate alpha to keep statistics current
- Gradually relax the correction bounds during training to transition from BN to Renorm
- Provide explicit backpropagation equations for x, y, mu, sigma, r, d, gamma, beta
- Outline an algorithm that updates mu and sigma and applies the renormalization in forward and backward passes
Experimental results
Research questions
- RQ1Can Batch Renormalization reduce the mismatch between training and inference activations observed with Batch Normalization on small or non-i.i.d. minibatches?
- RQ2Does Batch Renormalization retain Batch Normalization benefits (training speed, initialization insensitivity) while improving performance on challenging minibatch regimes?
- RQ3How should the correction bounds (r_max, d_max) and moving-average update rate (alpha) be scheduled for stable training?
- RQ4Is Batch Renormalization effective across architectures and tasks where BN is typically used (e.g., image classification with Inception/V3)?
Key findings
- Batch Renormalization achieves comparable or modestly higher validation accuracy than Batch Normalization on ImageNet with Inception-v3 when using minibatch size 32 across 50 workers (78.3% baseline BN vs 78.5% with Renorm)
- With microbatches of 4 (small minibatches), Batch Renorm trains faster and attains higher accuracy (76.5% at 130k steps) than BatchNorm (74.2% at 210k steps)
- On non-i.i.d. minibatches sampled by labels, BatchNorm collapses performance while Batch Renorm recovers to baseline-like accuracy (78.5% at 120k steps)
- Batch Renormalization eliminates the overfitting to biased minibatch distributions seen with BatchNorm in metric-learning like minibatch setups
- The method remains easy to implement, runs with similar speed to BN, and introduces hyperparameters (alpha, r_max, d_max) with a schedule for relaxing correction during training
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.