[论文解读] Learning to Learn: Meta-Critic Networks for Sample Efficient Learning
论文提出一个元评价器框架,学习一个任务和演员条件化的评价器,以在强化学习和监督学习中 guiding 多个演员,从少量示例快速适应,并受益于半监督数据。
We propose a novel and flexible approach to meta-learning for learning-to-learn from only a few examples. Our framework is motivated by actor-critic reinforcement learning, but can be applied to both reinforcement and supervised learning. The key idea is to learn a meta-critic: an action-value function neural network that learns to criticise any actor trying to solve any specified task. For supervised learning, this corresponds to the novel idea of a trainable task-parametrised loss generator. This meta-critic approach provides a route to knowledge transfer that can flexibly deal with few-shot and semi-supervised conditions for both reinforcement and supervised learning. Promising results are shown on both reinforcement and supervised learning problems.
研究动机与目标
- Motivate learning-to-learn to perform well from only a few examples in both RL and supervised learning.
- Propose a global meta-critic that can criticise any actor solving any task by conditioning on task and actor.
- Introduce a task-actor encoder to generate a task-actor embedding for conditioning the meta-critic.
- Enable knowledge transfer that leverages unlabelled data through a semi-supervised supervision signal.
- Demonstrate sample-efficient learning and robust transfer across multiple experimental settings.
提出的方法
- Define a meta-critic comprising a meta-value network (MVN) and a task-actor encoder (TAEN).
- Use a task-actor embedding z_t = C_ω(L_t−k) to condition the critic on the current task and actor.
- TAEN reads a learning trace L_tk = [(s_t−k, a_t−k, r_t−k), ..., (s_t−1, a_t−1, r_t−1)] to produce z_t.
- Train actors across multiple tasks with the meta-critic providing supervision via Q_φ(s_t, a_t, z_t) and TD-like updates.
- Extend the framework to discrete and continuous action RL, and to a supervised learning setting via a one-step actor-environment game where reward is negative loss.
- Leverage unlabelled data during meta-testing by using the meta-critic’s supervision without ground-truth labels.
实验结果
研究问题
- RQ1Can a single meta-critic, conditioned on task and actor, effectively supervise diverse actors across multiple tasks in RL and SL?
- RQ2Does task conditioning via a task-actor encoder enable robust transfer in multi-task meta-learning with diverse task distributions?
- RQ3Can semi-supervised data be exploited during meta-testing to further improve sample efficiency?
- RQ4How does meta-critic guidance compare to existing meta-learning approaches (e.g., MAML) across SL and RL benchmarks?
- RQ5What is the impact of using a shared meta-critic on rapid adaptation to new tasks with few trials or demonstrations?
主要发现
- The meta-critic framework enables rapid adaptation for new tasks in both RL and supervised learning settings.
- The TAEN-embedded task conditioning allows the critic to generalize across diverse task distributions, improving performance on mixture tasks where single-prior methods struggle.
- In supervised learning, the meta-critic can supervise learning from few labeled examples and also leverage unlabelled data during meta-testing.
- Across RL experiments (dependent multi-armed bandits and cartpole), Meta-Critic outperforms standard, All+FT, and MAML baselines in sample-efficient learning and final performance.
- The learned TAEN embeddings reflect task structure (e.g., cartpole pole length) without explicit exposure to the task parameter, indicating meaningful task manifolds。”],
- table_headers:[],
- table_rows:[]
- table_headers_empty_note:
- table_rows_empty_note:
- table_headers_omitted_reason:
- table_rows_omitted_reason:
- table_headers_expected_format:
- table_rows_expected_format:
- notes:
- table_headers][]
- table_rows":[]} } } {
- } } {
- } } { } } }{
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。