QUICK REVIEW

[論文レビュー] Rethinking Plasticity in Deep Reinforcement Learning

Zhiqiang He|arXiv (Cornell University)|Mar 22, 2026

Reinforcement Learning in Robotics被引用数 0

ひとこと要約

本論文は Optimization-Centric Plasticity (OCP) 仮説を提案し、深層RLにおける可塑性喪失を説明し、休眠をゼログラデient との等価性と結びつけ、タスク依存性とパラメータ制約の利点を示す。

ABSTRACT

This paper investigates the fundamental mechanisms driving plasticity loss in deep reinforcement learning (RL), a critical challenge where neural networks lose their ability to adapt to non-stationary environments. While existing research often relies on descriptive metrics like dormant neurons or effective rank, these summaries fail to explain the underlying optimization dynamics. We propose the Optimization-Centric Plasticity (OCP) hypothesis, which posits that plasticity loss arises because optimal points from previous tasks become poor local optima for new tasks, trapping parameters during task transitions and hindering subsequent learning. We theoretically establish the equivalence between neuron dormancy and zero-gradient states, demonstrating that the absence of gradient signals is the primary driver of dormancy. Our experiments reveal that plasticity loss is highly task-specific; notably, networks with high dormancy rates in one task can achieve performance parity with randomly initialized networks when switched to a significantly different task, suggesting that the network's capacity remains intact but is inhibited by the specific optimization landscape. Furthermore, our hypothesis elucidates why parameter constraints mitigate plasticity loss by preventing deep entrenchment in local optima. Validated across diverse non-stationary scenarios, our findings provide a rigorous optimization-based framework for understanding and restoring network plasticity in complex RL domains.

研究の動機と目的

Optimization-Centric Plasticity (OCP) 仮説を提案し、深層RLにおける可塑性喪失を説明する。
前のタスクからの最適点が新しいタスクではサブ最適となり、パラメータが局所最適に閉じ込められることを示す。
ニューロンの休眠をゼログラデント状態と理論的に結びつける。
可塑性喪失がタスクに強く依存し、最適化の地形関係によって調整されることを示す。

提案手法

休眠を勾配駆動現象として定式化し、休眠ニューロンとゼロ勾配との同値性を証明する。
ニューロン活性と勾配を定量化する休眠指数とMAGIを定義・分析する。
タスク遷移下でのPPOを用いた統制実験により局所最適への閉じこまりとタスク変更時の wake-up を示す。
パラメータ制約が早期の局所最適への閉じこまりを制限することで可塑性喪失を緩和することを主張する。
休眠、ゼロ勾配、定常なニューロン出力を結ぶ理論的補題と定理を提供する。

実験結果

リサーチクエスチョン

RQ1タスク遷移時に深層RLで可塑性喪失を引き起こす機構は何か。
RQ2ニューロンの休眠とゼロ勾配状態は理論的・実証的にどう関連するのか。
RQ3タスク関係性と最適化地形が可塑性喪失と学習適応性にどう影響するのか。
RQ4パラメータ制約は局所最適への閉じこまりを制限することで可塑性を保持できるのか。
RQ5可塑性喪失は一般的なデータ問題というより、タスクに高度に特異的なのか。

主な発見

可塑性喪失は、前のタスクから局所最適へパラメータが閉じ込められることで新しいタスクの学習を妨げる、という説明で成り立つ。
休眠ニューロンは出力ゼロ・勾配ゼロ条件に対応し、更新を通じて休眠を強化する。
目的関数が大幅に変化するタスク変更は休眠を低減し、ランダム初期化に近い学習を許容する。
高い休眠が必ずしも適応性低下を意味しない場合があり、タスク変更時にネットワークは回復可能である。
パラメータ空間を制約することは局所最適への深い閉じこまりを防ぐことで可塑性喪失を緩和する。
勾配なし最適化アプローチは可塑性喪失を減らし、適応速度と最終性能を向上させる可能性がある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。