[Paper Review] Learning an Adaptive Learning Rate Schedule
The paper introduces a reinforcement learning framework to auto-learn an adaptive learning rate schedule that responds to training dynamics and demonstrates improved results and transferability across datasets and architectures.
The learning rate is one of the most important hyper-parameters for model training and generalization. However, current hand-designed parametric learning rate schedules offer limited flexibility and the predefined schedule may not match the training dynamics of high dimensional and non-convex optimization problems. In this paper, we propose a reinforcement learning based framework that can automatically learn an adaptive learning rate schedule by leveraging the information from past training histories. The learning rate dynamically changes based on the current training dynamics. To validate this framework, we conduct experiments with different neural network architectures on the Fashion MINIST and CIFAR10 datasets. Experimental results show that the auto-learned learning rate controller can achieve better test results. In addition, the trained controller network is generalizable -- able to be trained on one data set and transferred to new problems.
Motivation & Objective
- Motivate the need for flexible learning rate schedules beyond fixed parametric forms due to diverse training dynamics in high-dimensional non-convex optimization.
- Propose a reinforcement learning framework to automatically adapt the learning rate based on past training history.
- Define suitable state features, reward signals, and action design to enable stable learning rate control.
- Demonstrate improved generalization and transferability of the learned controller across datasets and architectures.
Proposed method
- A reinforcement learning controller proposes learning rate scaling factors based on training dynamics observed from the trainee network.
- State observations include train/validation loss, prediction variances, and statistics of the final layer weights, plus the previous learning rate.
- Reward is the per-step validation loss to provide frequent feedback for credit assignment.
- Action is a learning-rate scaling factor applied to the previous step’s learning rate, enabling warm-up and decay.
- The controller is trained with Proximal Policy Optimization (PPO) to learn a policy that minimizes cumulative validation loss.
- Experiments compare auto-learned schedules against a baseline step-decay schedule on Fashion-MNIST and CIFAR-10 using CNN and ResNet architectures.
Experimental results
Research questions
- RQ1Can an RL-based controller learn a more effective adaptive learning rate schedule than fixed-step parametric schedules?
- RQ2Do learned controllers generalize across different datasets and model architectures?
- RQ3Does using per-step validation loss as reward improve credit assignment compared to final-only rewards?
- RQ4Is a learning-rate scaling action more stable and transferable than directly outputting raw learning rates?
Key findings
- Auto-learned schedules achieve better test loss and accuracy than the baseline step-decay schedules across all tested tasks.
- Controllers exhibit diverse learned patterns (e.g., warm-up then decay, or flat then warm-up and decay) tailored to the model/dataset, indicating dynamic adaptation.
- Transfer experiments show controllers trained on CIFAR-10 transfer effectively to Fashion-MNIST, outperforming transferred baselines.
- Per-step reward signals improve training dynamics and enable more stable learning-rate control than final-only rewards.
- The approach generalizes to CNN and ResNet architectures on both datasets.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.