QUICK REVIEW

[论文解读] Differentiable plasticity: training plastic neural networks with backpropagation

Thomas Miconi, Jeff Clune|arXiv (Cornell University)|Apr 6, 2018

Advanced Memory and Neural Computing被引用 92

一句话总结

这篇论文通过梯度下降训练可训练的可塑性（Hebbian痕迹），实现跨模式记忆、一次性Omniglot分类以及迷宫强化学习任务的快速终身学习，通常优于非可塑基线。

ABSTRACT

How can we build agents that keep learning from experience, quickly and efficiently, after their initial training? Here we take inspiration from the main mechanism of learning in biological brains: synaptic plasticity, carefully tuned by evolution to produce efficient lifelong learning. We show that plasticity, just like connection weights, can be optimized by gradient descent in large (millions of parameters) recurrent networks with Hebbian plastic connections. First, recurrent plastic networks with more than two million parameters can be trained to memorize and reconstruct sets of novel, high-dimensional 1000+ pixels natural images not seen during training. Crucially, traditional non-plastic recurrent networks fail to solve this task. Furthermore, trained plastic networks can also solve generic meta-learning tasks such as the Omniglot task, with competitive results and little parameter overhead. Finally, in reinforcement learning settings, plastic networks outperform a non-plastic equivalent in a maze exploration task. We conclude that differentiable plasticity may provide a powerful novel approach to the learning-to-learn problem.

研究动机与目标

通过生物学启发的突触可塑性激发终身学习的动机。
引入一个可微分的可塑性框架，其中每个连接具有固定分量和可塑分量。
在多种任务上演示通过梯度优化可塑性参数（模式 memorization、Omniglot、强化学习）。
展示学习到的可塑性在复杂记忆任务上可以超越非可塑对手，并在元学习基准上竞争。

提出的方法

定义一个网络，其中每个连接具有固定权重 w_ij 和可塑分量 alpha_ij * Hebb_ij，Hebb_ij 跟踪前后突触活动。
对 Hebb_ij 使用循环更新，例如 Hebb_ij(t+1) = eta * x_i(t-1) * x_j(t) + (1 - eta) * Hebb_ij(t)（或不带衰减的 Oja 规则变体）。
总有效权重为 w_ij + alpha_ij * Hebb_ij，使固定路径和可塑路径并存。
通过情节（episodes）反向传播来优化 w_ij 和 alpha_ij；eta（可塑性学习率）是共享并被学习。
在任务上进行测试：二进制模式记忆、自然图像记忆（CIFAR-10）、Omniglot一-shot分类，以及迷宫探索RL任务。

实验结果

研究问题

RQ1能否在规模化下通过反向传播训练可微分可塑性（百万级参数）？
RQ2学习到的可塑性是否能实现快速记忆形成和模式重建，而不仅仅依赖固定权重？
RQ3可微分可塑性在Omniglot和强化学习任务上是否具有与成熟元学习方法的竞争力？
RQ4每连接的可塑性结构（独立的 alpha_ij）与共享可塑性在性能上有何差异？
RQ5在复杂记忆任务上，可塑网络是否能超越非可塑递归模型（RNN/LSTM）？

主要发现

具有可微分可塑性的可塑网络能够解决高维度的模式记忆，非可塑RNN和LSTM难以做到，且通常更快（例如 2,000 轮 vs 500,000 轮）。
在自然图像记忆任务中，学习到的可塑性产生结构化的 Hebb 矩阵，并且优于固定可塑基线，独立的 alpha 相较于共享 alpha 提升了性能。
Omniglot 5-way 1-shot 分类在使用可塑连接时达到 98.3% 的准确率（95% 置信区间 ±0.80），与多种元学习方法竞争，并且参数开销适中。
在一个迷宫强化学习任务中，可微分可塑性比非可塑和同质可塑网络表现更好，表明每连接可塑性定制的好处。
研究表明梯度下降可以优化可塑性规则本身，使元学习和记忆增强计算超越传统固定权重网络。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。