QUICK REVIEW

[論文レビュー] Conditions for Stability and Convergence of Set-Valued Stochastic Approximations: Applications to Approximate Value and Fixed point Iterations with Noise.

Arunselvan Ramaswamy, Shalabh Bhatnagar|arXiv (Cornell University)|Sep 14, 2017

Stochastic processes and financial applications被引用数 1

ひとこと要約

本論文は、明示的に構築可能なリャプノフ関数を用いて、集合値確率的近似アルゴリズムの安定性および収束性について、簡単に検証可能な十分条件を確立している。本研究は、近似値反復および集合値写像を用いた固定点反復に対する、モデルフリーな最初の解析を提供しており、従来の仮定を著しく弱める一方で、確実収束を保証している。

ABSTRACT

The main aim of this paper is the development of easily verifiable sufficient conditions for stability (almost sure boundedness) and convergence of stochastic approximation algorithms (SAAs) with set-valued mean-fields, a class of model-free algorithms that have become important in recent times. In this paper we provide a complete analysis of such algorithms under three different, yet related sets of sufficient conditions, based on the existence of an associated global/local Lyapunov function. Unlike previous Lyapunov function based approaches, we provide a simple recipe for explicitly constructing the Lyapunov function, needed for analysis. Our work builds on the works of Abounadi, Bertsekas and Borkar (2002), Munos (2005), and Ramaswamy and Bhatnagar (2016). An important motivation for the flavor of our assumptions comes from the need to understand dynamic programming and reinforcement learning algorithms, that use deep neural networks (DNNs) for function approximations and parameterizations. These algorithms are popularly known as deep learning algorithms. As an important application of our theory, we provide a complete analysis of the stochastic approximation counterpart of approximate value iteration (AVI), an important dynamic programming method designed to tackle Bellman's curse of dimensionality. Further, the assumptions involved are significantly weaker, easily verifiable and truly model-free. The theory presented in this paper is also used to develop and analyze the first SAA for finding fixed points of contractive set-valued maps.

研究の動機と目的

集合値平均場を伴う確率的近似アルゴリズムのほとんど確実な有界性および収束性を保証する、簡単に検証可能な十分条件の開発。
安定性および収束性の検証を可能にする、リャプノフ関数を構築する体系的な手法の提供。
動的計画法の中心的手法である近似値反復（AVI）の確率的近似版の解析、次元の呪いをモデルフリーな設定で取り扱う。
収縮的集合値写像の固定点を求めるための、初めてのそのような確率的近似アルゴリズムの理論的拡張。
関数近似に深層ニューラルネットワークを用いる深層強化学習アルゴリズムの分析を支援。

提案手法

Abounadiら（2002）、Munos（2005）、RamaswamyおよびBhatnagar（2016）の先行研究を拡張し、リャプノフ関数に基づく解析を活用。
安定性および収束性のための、グローバルまたはローカルなリャプノフ関数の存在に基づく、3つの関連する十分条件のセットを導入。
必要なリャプノフ関数を明示的に構築するための構成的メソッドを提供し、実際の検証を簡素化。
本フレームワークを、動的計画法の主要な手法の一つである近似値反復（AVI）の確率的近似版の解析に適用。
集合値写像への応用を拡張し、収縮的集合値写像の固定点を求める最初の確率的近似アルゴリズムを開発・解析。
仮定をより弱く、直感的で、真にモデルフリーな形に保証し、深層学習ベースの強化学習に適している。

実験結果

リサーチクエスチョン

RQ1集合値平均場を伴う確率的近似アルゴリズムのほとんど確実な有界性および収束性を保証する条件は何か？
RQ2このようなアルゴリズムにおける安定性および収束性を検証するために、リャプノフ関数をどのように明示的に構築できるか？
RQ3提案されたフレームワークは、近似値反復の確率的近似版の解析に適用可能か？
RQ4モデルフリーな設定において、集合値更新を伴う収束に必要な最小限で検証可能な仮定は何か？
RQ5収縮的集合値写像の固定点を求める確率的近似アルゴリズムを、開発および解析することは可能か？

主な発見

本論文は、明示的に構築可能なリャプノフ関数を用いて、集合値確率的近似アルゴリズムのほとんど確実な収束性を保証する十分条件を確立した。
提案された条件は、従来の手法よりも著しく弱く、検証が容易であるため、より広範な適用が可能である。
フレームワークは、動的計画法における中心的メソッドである近似値反復（AVI）の確率的近似版の解析に成功した。
理論により、収縮的集合値写像の固定点を計算する最初のモデルフリーな確率的近似アルゴリズムが可能になった。
複雑な学習アルゴリズムの安定性解析における主要なブottleneckを克服するため、リャプノフ関数を構築する実用的な手法を提供した。
結果として、深層ニューラルネットワークを関数近似に用いる深層強化学習アルゴリズムの理論的基盤が強化された。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。