[论文解读] Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Orca 是一个 13B 模型,它通过使用带有解释的信号和渐进式教师引导来学习模仿 GPT-4 的推理,在 Big-Bench Hard 上达到与 ChatGPT 的同等水平,在 AGIEval 和专业考试上表现出色,但仍落后于 GPT-4。
Recent research has focused on enhancing the capability of smaller models through imitation learning, drawing on the outputs generated by large foundation models (LFMs). A number of issues impact the quality of these models, ranging from limited imitation signals from shallow LFM outputs; small scale homogeneous training data; and most notably a lack of rigorous evaluation resulting in overestimating the small model's capability as they tend to learn to imitate the style, but not the reasoning process of LFMs. To address these challenges, we develop Orca (We are working with our legal team to publicly release a diff of the model weights in accordance with LLaMA's release policy to be published at https://aka.ms/orca-lm), a 13-billion parameter model that learns to imitate the reasoning process of LFMs. Orca learns from rich signals from GPT-4 including explanation traces; step-by-step thought processes; and other complex instructions, guided by teacher assistance from ChatGPT. To promote this progressive learning, we tap into large-scale and diverse imitation data with judicious sampling and selection. Orca surpasses conventional state-of-the-art instruction-tuned models such as Vicuna-13B by more than 100% in complex zero-shot reasoning benchmarks like Big-Bench Hard (BBH) and 42% on AGIEval. Moreover, Orca reaches parity with ChatGPT on the BBH benchmark and shows competitive performance (4 pts gap with optimized system message) in professional and academic examinations like the SAT, LSAT, GRE, and GMAT, both in zero-shot settings without CoT; while trailing behind GPT-4. Our research indicates that learning from step-by-step explanations, whether these are generated by humans or more advanced AI models, is a promising direction to improve model capabilities and skills.
研究动机与目标
- 解决对大型基础模型的模仿与学习其推理过程之间差距的问题。
- 利用带有解释的信号和系统指令,在较小模型中实现渐进式学习。
- 扩大和多样化训练数据,以提高零-shot 推理和开放式生成能力。
- 在开放式生成、推理基准测试、安全性和专业考试等方面评估 Orca。
提出的方法
- 将 ⟨query, response⟩ 对用 GPT-4 解释痕迹进行增强,以揭示推理过程。
- 使用系统指令来引出解释和逐步思考过程。
- 利用 FLAN-v2(Flan 2022)作为一个大型、多样化的任务集合,并采样零-shot 提示。
- 两阶段教师策略:先在 5M ChatGPT 增强的指令上训练,然后在 1M GPT-4-augmented 指令上进行微调。
- 使用 LLaMA-风格的 BPE 分词和 32,001 token 词汇表进行训练;为提高效率而采用 packing。
实验结果
研究问题
- RQ1解释微调是否能够使小模型学习更大 LFMs 的推理过程?
- RQ2通过中间教师(先 ChatGPT 再 GPT-4)的渐进式学习是否能提高零-shot 推理和任务性能?
- RQ313B 模型在复杂推理基准和专业考试上能接近 ChatGPT 和 GPT-4 的程度有多近?
主要发现
- 在 BigBench Hard (BBH) 的复杂零-shot 推理中,Orca 超过 Vicuna-13B 超过 100%。
- Orca 在 AGIEval 相对于基线提升 42%。
- Orca 在 BBH 的零-shot 设置达到与 ChatGPT 的同等水平,不需要 CoT。
- Orca 在 SAT, LSAT, GRE, GMAT(零-shot,MCQ)上表现具有竞争力,较优化系统消息只差 4 点。
- Orca 仍然落后于 GPT-4,但通过基于解释的学习显示出强大的推理和理解能力提升。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。