QUICK REVIEW

[论文解读] Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Maxim Kaledin, Éric Moulines|arXiv (Cornell University)|Feb 4, 2020

Probabilistic and Robust Engineering Design参考文献 25被引用 26

一句话总结

本文针对在马尔可夫噪声下的线性双时标随机逼近提供了有限时间分析，表明收敛速率与独立同分布（i.i.d.）噪声下的情况一致，仅常数受马尔可夫链混合时间的影响。在最优步长调度下，期望误差的瞬态部分以 $o(1/k^c)$ 速率衰减，渐近部分为 ${\cal O}(1/k)$，且存在匹配的 $\Omega(1/k)$ 下界。

ABSTRACT

Linear two-timescale stochastic approximation (SA) scheme is an important class of algorithms which has become popular in reinforcement learning (RL), particularly for the policy evaluation problem. Recently, a number of works have been devoted to establishing the finite time analysis of the scheme, especially under the Markovian (non-i.i.d.) noise settings that are ubiquitous in practice. In this paper, we provide a finite-time analysis for linear two timescale SA. Our bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise, only the constants are affected by the mixing time of the Markov chain. With an appropriate step size schedule, the transient term in the expected error bound is $o(1/k^c)$ and the steady-state term is ${\cal O}(1/k)$, where $c>1$ and $k$ is the iteration number. Furthermore, we present an asymptotic expansion of the expected error with a matching lower bound of $Ω(1/k)$. A simple numerical experiment is presented to support our theory.

研究动机与目标

建立在马尔可夫噪声下线性双时标随机逼近的有限时间误差界，该设定在强化学习中常见但具有挑战性。
弥合噪声依赖（马尔可夫）与噪声独立（i.i.d.）时收敛速率理论理解之间的差距。
推导出紧致的误差界，包含匹配的上下界，确认 $\mathcal{O}(1/k)$ 稳态速率的最优性。
通过数值实验验证理论结果，展示预测的误差衰减行为。

提出的方法

通过双时标算法的线性系统表示，提出一种新颖的误差分解方法，将误差划分为瞬态与稳态分量。
构造时变李雅普诺夫函数以追踪误差演化，考虑马尔可夫噪声的非i.i.d.特性。
结合底层马尔可夫链的混合时间性质，以界依赖性对收敛常数的影响。
推导期望误差的渐近展开式，实现对 $\mathcal{O}(1/k)$ 稳态项的精确刻画。
通过构造与上界匹配的下界，证明 $\Omega(1/k)$ 的下界成立，从而证实该速率最优。
设计最优步长调度以最小化瞬态误差衰减速率，实现对任意 $c>1$ 的 $o(1/k^c)$。

实验结果

研究问题

RQ1在马尔可夫噪声下，线性双时标随机逼近的收敛速率是否劣于i.i.d.噪声下的情况？
RQ2当噪声过程为马尔可夫链时，能否为双时标算法推导出紧致的有限时间误差界？
RQ3期望误差的精确渐近行为是什么？$\mathcal{O}(1/k)$ 速率是否紧致？
RQ4马尔可夫链的混合时间如何影响误差界中的收敛常数？
RQ5能否设计出最优步长调度，以加速瞬态误差衰减，同时保持 $\mathcal{O}(1/k)$ 的稳态速率？

主要发现

在最优步长调度下，有限时间误差界中的瞬态项以 $o(1/k^c)$ 速率衰减，其中任意 $c>1$。
稳态误差项为 $\mathcal{O}(1/k)$，与i.i.d.噪声下已知的最佳速率一致。
$\mathcal{O}(1/k)$ 速率是最优的，因为已建立匹配的 $\Omega(1/k)$ 下界。
马尔可夫链的混合时间仅影响误差界中的常数，不影响收敛速率。
数值实验验证了理论误差衰减行为，结果与预测的渐近行为一致。
分析结果表明，马尔可夫噪声不会降低收敛速率，仅通过混合时间使常数变差。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。