QUICK REVIEW

[论文解读] On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent

Xingwen Zhang, Jeff Clune|arXiv (Cornell University)|Dec 18, 2017

Reinforcement Learning in Robotics参考文献 2被引用 38

一句话总结

本文通过基于MNIST的实验，研究了OpenAI进化策略（ES）与随机梯度下降（SGD）之间的关系，表明ES在MNIST上可实现99%的测试准确率，超越了以往的进化方法。研究发现ES与SGD梯度具有强相关性，并提出一种基于SGD的代理模型，用于预测不同种群规模下ES的性能。

ABSTRACT

Because stochastic gradient descent (SGD) has shown promise optimizing neural networks with millions of parameters and few if any alternatives are known to exist, it has moved to the heart of leading approaches to reinforcement learning (RL). For that reason, the recent result from OpenAI showing that a particular kind of evolution strategy (ES) can rival the performance of SGD-based deep RL methods with large neural networks provoked surprise. This result is difficult to interpret in part because of the lingering ambiguity on how ES actually relates to SGD. The aim of this paper is to significantly reduce this ambiguity through a series of MNIST-based experiments designed to uncover their relationship. As a simple supervised problem without domain noise (unlike in most RL), MNIST makes it possible (1) to measure the correlation between gradients computed by ES and SGD and (2) then to develop an SGD-based proxy that accurately predicts the performance of different ES population sizes. These innovations give a new level of insight into the real capabilities of ES, and lead also to some unconventional means for applying ES to supervised problems that shed further light on its differences from SGD. Incorporating these lessons, the paper concludes by demonstrating that ES can achieve 99% accuracy on MNIST, a number higher than any previously published result for any evolutionary method. While not by any means suggesting that ES should substitute for SGD in supervised learning, the suite of experiments herein enables more informed decisions on the application of ES within RL and other paradigms.

研究动机与目标

澄清尽管ES在深度强化学习中取得成功，但其与OpenAI进化策略（ES）和随机梯度下降（SGD）之间的关系仍不明确。
通过在低噪声、监督学习设置中测量梯度相关性，探究ES是否表现得像有限差分梯度近似器，或是一种独立的优化范式。
开发一种基于SGD的代理模型，以准确预测ES在MNIST上不同种群规模下的性能。
探索ES在监督学习中的非传统应用，以凸显其与SGD的差异，并揭示其潜在能力。
证明ES可在MNIST上实现进化方法中的最先进性能，挑战其在高维、深度网络中存在局限性的假设。

提出的方法

在监督学习设置下进行受控的MNIST实验，以最小化领域噪声并隔离优化动力学。
测量通过ES计算的梯度与通过标准反向传播（SGD）计算的梯度在相同网络权重下的相关性。
构建一种基于SGD的代理模型，通过估计扰动种群的期望梯度来预测ES性能。
利用该代理模型预测ES在不运行完整ES试验情况下的最优种群规模，并在多种配置下验证其准确性。
以非传统方式将ES应用于监督学习，例如使用有限扰动和不使用小批量，以分析其与SGD的行为差异。
分析扰动方差（σ）对ES性能的影响及其与有限差分近似方法的偏离，特别是在噪声环境中。

实验结果

研究问题

RQ1在低噪声、监督学习设置下，OpenAI的ES所估计的梯度与通过反向传播（SGD）计算的真实梯度之间的相关性有多高？
RQ2基于SGD的代理模型能否准确预测ES在MNIST上不同种群规模下的性能？
RQ3当扰动方差（σ）变化时，ES在本质上与有限差分梯度近似器有何不同？
RQ4ES在监督学习中能多大程度上被有效应用？这为理解其在强化学习中的行为提供了哪些启示？
RQ5ES的学习曲线平滑度（尤其是无小批量时）与SGD相比如何？这对强化学习应用有何含义？

主要发现

在MNIST设置中，ES所估计的梯度与通过反向传播（SGD）计算的真实梯度表现出强烈相关性，表明ES近似了有意义的下降方向。
成功开发了一种基于SGD的代理模型，可准确预测不同ES种群规模的性能，从而实现在不运行完整ES试验的情况下进行性能估计。
当固定扰动方差（σ）时，ES随σ增大而逐渐偏离有限差分近似器，表明其不仅优化权重向量，还优化扰动分布本身。
采用无小批量方法后，ES产生的学习曲线显著比SGD更平滑，表明其在稳定性和抗噪声方面具有潜在优势。
ES在MNIST上实现了99%的测试准确率，该结果高于以往任何已发表的进化方法，证明其在大规模、深度网络中的能力。
结果表明，ES并非仅是梯度近似器，而是一种具有独特属性的独立优化范式，尤其在结合大规模并行计算与精细超参数调优时更为显著。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。