Skip to main content
QUICK REVIEW

[論文レビュー] Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck

Maximilian Igl, Kamil Ciosek|arXiv (Cornell University)|Oct 28, 2019
Reinforcement Learning in Robotics被引用数 57
ひとこと要約

本論文は Selective Noise Injection (SNI) および Information Bottleneck Actor Critic (IBAC) を導入し、IBとSNIを組み合わせると CoinRun および Multiroom ベンチマークにおける強化学習の一般化性能が最先端になることを示している。

ABSTRACT

The ability for policies to generalize to new environments is key to the broad application of RL agents. A promising approach to prevent an agent's policy from overfitting to a limited set of training environments is to apply regularization techniques originally developed for supervised learning. However, there are stark differences between supervised learning and RL. We discuss those differences and propose modifications to existing regularization techniques in order to better adapt them to RL. In particular, we focus on regularization techniques relying on the injection of noise into the learned function, a family that includes some of the most widely used approaches such as Dropout and Batch Normalization. To adapt them to RL, we propose Selective Noise Injection (SNI), which maintains the regularizing effect the injected noise has, while mitigating the adverse effects it has on the gradient quality. Furthermore, we demonstrate that the Information Bottleneck (IB) is a particularly well suited regularization technique for RL as it is effective in the low-data regime encountered early on in training RL agents. Combining the IB with SNI, we significantly outperform current state of the art results, including on the recently proposed generalization benchmark Coinrun.

研究の動機と目的

  • Motivate regularization for RL to improve generalization across unseen environments.
  • Adapt stochastic regularization techniques to RL without destabilizing training.
  • Promote feature compression to enhance robustness under non-stationary data distributions.
  • Propose IBAC to encourage compressed, transferable representations in actor-critic RL.
  • Evaluate the proposed methods on challenging generalization tasks and compare to prior art.

提案手法

  • Introduce Selective Noise Injection (SNI) to apply stochastic regularization only when beneficial and deterministically otherwise.
  • Adapt Dropout and Variational Information Bottleneck (VIB) for RL; use SNI to mitigate adverse gradient and data-quality effects.
  • Develop Information Bottleneck Actor Critic (IBAC) by integrating IB principles into an actor-critic RL framework.
  • Formulate IBAC objective as a combination of actor-critic losses, IB regularization, and an entropy/regularization term.
  • Combine IBAC with SNI to reduce variance in off-policy corrections and improve generalization.
  • Evaluate on PPO-based actor-critic setup across Multiroom and CoinRun benchmarks.

実験結果

リサーチクエスチョン

  • RQ1How can stochastic regularization be safely integrated into RL without harming gradient quality and data efficiency?
  • RQ2Does selective noise application preserve the regularizing benefits while avoiding destabilization in actor-critic RL?
  • RQ3Can an Information Bottleneck-based regularization improve generalization in RL in low-data early training stages?
  • RQ4Does combining IBAC with SNI yield superior generalization performance on challenging RL benchmarks like Multiroom and CoinRun?

主な発見

  • Selective Noise Injection reduces adverse effects of noise on rollout quality and gradient variance.
  • IBAC encourages compressed input features leading to improved generalization in RL, especially in low-data regimes.
  • IBAC combined with SNI outperforms prior state of the art on CoinRun and Multiroom benchmarks.
  • SNI helps stabilize training when stochastic regularization is used with IBAC.
  • On CoinRun, ibac with sni significantly outperforms the baseline and other regularization regimes that rely solely on non-stochastic techniques.
  • IBAC without proper regularization can underperform, particularly with heavier stochasticity; SNI mitigates this risk.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。