QUICK REVIEW

[论文解读] Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning

R. Srikant, Lei Ying|arXiv (Cornell University)|Feb 3, 2019

Advanced Bandit Algorithms Research参考文献 16被引用 42

一句话总结

该论文推导了带马尔可夫噪声的线性随机逼近的有限时间均方误差界，并将其应用于 TD 学习，利用李亚普诺夫（Stein）方法量化误差动态。

ABSTRACT

We consider the dynamics of a linear stochastic approximation algorithm driven by Markovian noise, and derive finite-time bounds on the moments of the error, i.e., deviation of the output of the algorithm from the equilibrium point of an associated ordinary differential equation (ODE). We obtain finite-time bounds on the mean-square error in the case of constant step-size algorithms by considering the drift of an appropriately chosen Lyapunov function. The Lyapunov function can be interpreted either in terms of Stein's method to obtain bounds on steady-state performance or in terms of Lyapunov stability theory for linear ODEs. We also provide a comprehensive treatment of the moments of the square of the 2-norm of the approximation error. Our analysis yields the following results: (i) for a given step-size, we show that the lower-order moments can be made small as a function of the step-size and can be upper-bounded by the moments of a Gaussian random variable; (ii) we show that the higher-order moments beyond a threshold may be infinite in steady-state; and (iii) we characterize the number of samples needed for the finite-time bounds to be of the same order as the steady-state bounds. As a by-product of our analysis, we also solve the open problem of obtaining finite-time bounds for the performance of temporal difference learning algorithms with linear function approximation and a constant step-size, without requiring a projection step or an i.i.d. noise assumption.

研究动机与目标

需要在非独立同分布噪声或无投影步骤的情况下，在线性随机逼近和 TD 学习中建立有限时间误差界的动机。
利用李亚普诺夫函数及漂移分析，为常数步长算法推导有限时间均方误差界。
表征误差的矩，包括被高斯矩量化的低阶矩界以及稳态时可能不存在的高阶矩。
解释线性函数逼近与马尔可夫噪声下的 TD(0) 和 TD(λ) 的含义。

提出的方法

建立递推 Theta_{k+1} = Theta_k + ε (A(X_k) Theta_k + b(X_k))，含马尔可夫噪声，极限 E[A(X_k)] → Ã 与 E[b(X_k)] → 0。
使用李亚普诺夫（Stein）漂移分析来界定均方误差并将其与相关的ODE动力学联系起来。
将漂移框架扩展用于分析误差的所有矩，识别稳态中矩有限与否的情形。
将有限时间界与稳态性能相关联，并确定匹配阶的样本需求。
将结果应用于 TD 学习算法，表明在没有投影或独立同分布噪声假设的情况下也可获得有限时间界。
讨论步长趋近于0时与中心极限定理行为的联系。

实验结果

研究问题

RQ1在马尔可夫噪声下，线性随机逼近算法误差能建立哪些有限时间界？
RQ2基于李亚普诺夫/Stein 的漂移分析如何给出均方误差及高阶矩的界？
RQ3这些界能否针对具有线性函数逼近和恒定步长且无投影的 TD 学习进行特化？
RQ4稳态下误差的低阶和高阶矩的行为如何？
RQ5需要多少样本才能使有限时间界达到稳态阶？

主要发现

给定常数步长时，误差的低阶矩可被逼近为并受高斯矩量化的上界。
在稳态中，高阶矩可能在某一阈值以上无穷大，表明尾部行为非指数型。
分析得到有限时间的均方误差界，并描述与稳态界一致的样本复杂度。
为带线性函数逼近和恒定步长的 TD 学习提供有限时间界的解，且无需投影或独立同分布噪声假设。
结果将李亚普诺夫漂移分析与 Stein 方法联系起来，以理解稳态性能并与 ODE 稳定性相关。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。