Skip to main content
QUICK REVIEW

[论文解读] Learning to Learn: Meta-Critic Networks for Sample Efficient Learning

Flood Sung, Li Zhang|arXiv (Cornell University)|Jun 29, 2017
Reinforcement Learning in Robotics参考文献 31被引用 96
一句话总结

论文提出一个元评价器框架,学习一个任务和演员条件化的评价器,以在强化学习和监督学习中 guiding 多个演员,从少量示例快速适应,并受益于半监督数据。

ABSTRACT

We propose a novel and flexible approach to meta-learning for learning-to-learn from only a few examples. Our framework is motivated by actor-critic reinforcement learning, but can be applied to both reinforcement and supervised learning. The key idea is to learn a meta-critic: an action-value function neural network that learns to criticise any actor trying to solve any specified task. For supervised learning, this corresponds to the novel idea of a trainable task-parametrised loss generator. This meta-critic approach provides a route to knowledge transfer that can flexibly deal with few-shot and semi-supervised conditions for both reinforcement and supervised learning. Promising results are shown on both reinforcement and supervised learning problems.

研究动机与目标

  • Motivate learning-to-learn to perform well from only a few examples in both RL and supervised learning.
  • Propose a global meta-critic that can criticise any actor solving any task by conditioning on task and actor.
  • Introduce a task-actor encoder to generate a task-actor embedding for conditioning the meta-critic.
  • Enable knowledge transfer that leverages unlabelled data through a semi-supervised supervision signal.
  • Demonstrate sample-efficient learning and robust transfer across multiple experimental settings.

提出的方法

  • Define a meta-critic comprising a meta-value network (MVN) and a task-actor encoder (TAEN).
  • Use a task-actor embedding z_t = C_ω(L_t−k) to condition the critic on the current task and actor.
  • TAEN reads a learning trace L_tk = [(s_t−k, a_t−k, r_t−k), ..., (s_t−1, a_t−1, r_t−1)] to produce z_t.
  • Train actors across multiple tasks with the meta-critic providing supervision via Q_φ(s_t, a_t, z_t) and TD-like updates.
  • Extend the framework to discrete and continuous action RL, and to a supervised learning setting via a one-step actor-environment game where reward is negative loss.
  • Leverage unlabelled data during meta-testing by using the meta-critic’s supervision without ground-truth labels.

实验结果

研究问题

  • RQ1Can a single meta-critic, conditioned on task and actor, effectively supervise diverse actors across multiple tasks in RL and SL?
  • RQ2Does task conditioning via a task-actor encoder enable robust transfer in multi-task meta-learning with diverse task distributions?
  • RQ3Can semi-supervised data be exploited during meta-testing to further improve sample efficiency?
  • RQ4How does meta-critic guidance compare to existing meta-learning approaches (e.g., MAML) across SL and RL benchmarks?
  • RQ5What is the impact of using a shared meta-critic on rapid adaptation to new tasks with few trials or demonstrations?

主要发现

  • The meta-critic framework enables rapid adaptation for new tasks in both RL and supervised learning settings.
  • The TAEN-embedded task conditioning allows the critic to generalize across diverse task distributions, improving performance on mixture tasks where single-prior methods struggle.
  • In supervised learning, the meta-critic can supervise learning from few labeled examples and also leverage unlabelled data during meta-testing.
  • Across RL experiments (dependent multi-armed bandits and cartpole), Meta-Critic outperforms standard, All+FT, and MAML baselines in sample-efficient learning and final performance.
  • The learned TAEN embeddings reflect task structure (e.g., cartpole pole length) without explicit exposure to the task parameter, indicating meaningful task manifolds。”],
  • table_headers:[],
  • table_rows:[]
  • table_headers_empty_note:
  • table_rows_empty_note:
  • table_headers_omitted_reason:
  • table_rows_omitted_reason:
  • table_headers_expected_format:
  • table_rows_expected_format:
  • notes:
  • table_headers][]
  • table_rows":[]} } } {
  • } } {
  • } } { } } }{

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。