QUICK REVIEW

[论文解读] Learning to Learn: Meta-Critic Networks for Sample Efficient Learning

Flood Sung, Li Zhang|arXiv (Cornell University)|Jun 29, 2017

Reinforcement Learning in Robotics参考文献 31被引用 96

一句话总结

论文提出一个元评价器框架，学习一个任务和演员条件化的评价器，以在强化学习和监督学习中 guiding 多个演员，从少量示例快速适应，并受益于半监督数据。

ABSTRACT

We propose a novel and flexible approach to meta-learning for learning-to-learn from only a few examples. Our framework is motivated by actor-critic reinforcement learning, but can be applied to both reinforcement and supervised learning. The key idea is to learn a meta-critic: an action-value function neural network that learns to criticise any actor trying to solve any specified task. For supervised learning, this corresponds to the novel idea of a trainable task-parametrised loss generator. This meta-critic approach provides a route to knowledge transfer that can flexibly deal with few-shot and semi-supervised conditions for both reinforcement and supervised learning. Promising results are shown on both reinforcement and supervised learning problems.

研究动机与目标

Motivate learning-to-learn to perform well from only a few examples in both RL and supervised learning.
Propose a global meta-critic that can criticise any actor solving any task by conditioning on task and actor.
Introduce a task-actor encoder to generate a task-actor embedding for conditioning the meta-critic.
Enable knowledge transfer that leverages unlabelled data through a semi-supervised supervision signal.
Demonstrate sample-efficient learning and robust transfer across multiple experimental settings.

提出的方法

Define a meta-critic comprising a meta-value network (MVN) and a task-actor encoder (TAEN).
Use a task-actor embedding z_t = C_ω(L_t−k) to condition the critic on the current task and actor.
TAEN reads a learning trace L_tk = [(s_t−k, a_t−k, r_t−k), ..., (s_t−1, a_t−1, r_t−1)] to produce z_t.
Train actors across multiple tasks with the meta-critic providing supervision via Q_φ(s_t, a_t, z_t) and TD-like updates.
Extend the framework to discrete and continuous action RL, and to a supervised learning setting via a one-step actor-environment game where reward is negative loss.
Leverage unlabelled data during meta-testing by using the meta-critic’s supervision without ground-truth labels.

实验结果

研究问题

RQ1Can a single meta-critic, conditioned on task and actor, effectively supervise diverse actors across multiple tasks in RL and SL?
RQ2Does task conditioning via a task-actor encoder enable robust transfer in multi-task meta-learning with diverse task distributions?
RQ3Can semi-supervised data be exploited during meta-testing to further improve sample efficiency?
RQ4How does meta-critic guidance compare to existing meta-learning approaches (e.g., MAML) across SL and RL benchmarks?
RQ5What is the impact of using a shared meta-critic on rapid adaptation to new tasks with few trials or demonstrations?

主要发现

The meta-critic framework enables rapid adaptation for new tasks in both RL and supervised learning settings.
The TAEN-embedded task conditioning allows the critic to generalize across diverse task distributions, improving performance on mixture tasks where single-prior methods struggle.
In supervised learning, the meta-critic can supervise learning from few labeled examples and also leverage unlabelled data during meta-testing.
Across RL experiments (dependent multi-armed bandits and cartpole), Meta-Critic outperforms standard, All+FT, and MAML baselines in sample-efficient learning and final performance.
The learned TAEN embeddings reflect task structure (e.g., cartpole pole length) without explicit exposure to the task parameter, indicating meaningful task manifolds。”],
table_headers:[],
table_rows:[]
table_headers_empty_note:
table_rows_empty_note:
table_headers_omitted_reason:
table_rows_omitted_reason:
table_headers_expected_format:
table_rows_expected_format:
notes:
table_headers][]
table_rows":[]} } } {
} } {
} } { } } }{

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。