[Paper Review] Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control
This paper presents the first comprehensive empirical study on applying conventional regularization techniques—such as L2 and dropout—to policy networks in deep reinforcement learning for continuous control. It finds that regularizing the policy network significantly improves performance, especially on hard tasks, and offers insights into why regularization enhances generalization through sample efficiency, reward distribution, weight norm control, and noise robustness.
Deep Reinforcement Learning (Deep RL) has been receiving increasingly more attention thanks to its encouraging performance on a variety of control tasks. Yet, conventional regularization techniques in training neural networks (e.g., $L_2$ regularization, dropout) have been largely ignored in RL methods, possibly because agents are typically trained and evaluated in the same environment, and because the deep RL community focuses more on high-level algorithm designs. In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks. Interestingly, we find conventional regularization techniques on the policy networks can often bring large improvement, especially on harder tasks. Our findings are shown to be robust against training hyperparameter variations. We also compare these techniques with the more widely used entropy regularization. In addition, we study regularizing different components and find that only regularizing the policy network is typically the best. We further analyze why regularization may help generalization in RL from four perspectives - sample complexity, reward distribution, weight norm, and noise robustness. We hope our study provides guidance for future practices in regularizing policy optimization algorithms. Our code is available at this https URL .
Motivation & Objective
- To investigate the impact of conventional regularization techniques (e.g., L2, dropout) on policy optimization in deep reinforcement learning.
- To determine whether regularization improves generalization and sample efficiency in continuous control tasks.
- To compare the effectiveness of conventional regularization with entropy regularization, a widely used technique in RL.
- To analyze which components of the policy network benefit most from regularization.
- To understand the underlying reasons why regularization improves performance in RL from multiple theoretical perspectives.
Proposed method
- Empirically evaluate multiple regularization techniques—L2 weight decay, dropout, batch normalization—on policy networks across several continuous control environments.
- Apply regularization to different components of the policy network (e.g., actor head, value head, shared features) and compare performance.
- Use standard policy optimization algorithms (e.g., SAC, TD3) and vary hyperparameters to test robustness of regularization effects.
- Analyze the impact of regularization through four theoretical lenses: sample complexity, reward distribution shift, weight norm control, and noise robustness.
- Conduct ablation studies to isolate the contribution of regularization on the policy network versus other components.
- Release code to enable reproducibility and further benchmarking of regularization in policy optimization.
Experimental results
Research questions
- RQ1Does applying conventional regularization techniques (e.g., L2, dropout) to the policy network lead to performance gains in continuous control tasks?
- RQ2How does conventional regularization compare in effectiveness to entropy regularization in policy optimization?
- RQ3Which components of the policy network (e.g., policy head, value head) benefit most from regularization?
- RQ4Are the benefits of regularization robust across different hyperparameter settings and environments?
- RQ5What are the underlying reasons why regularization improves generalization in deep RL?
Key findings
- Conventional regularization techniques such as L2 and dropout consistently improve performance on continuous control tasks, particularly on harder environments.
- Regularizing the policy network alone yields better results than regularizing the value network or shared feature layers.
- The performance gains from regularization are robust across different hyperparameter settings, indicating broad applicability.
- Regularization reduces overfitting and improves generalization by stabilizing the reward distribution and controlling weight norms.
- The benefits of regularization are partly due to improved noise robustness and reduced sensitivity to input perturbations.
- The study demonstrates that regularization can enhance sample efficiency and reduce variance in learning dynamics.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.