[论文解读] Qwen2.5 Technical Report
Qwen2.5 提出一系列开放权重与 MoE 的大型语言模型,辅以数据与训练升级,具备强劲的开放权重性能和有竞争力的托管解决方案,同时包含广泛的训练后优化,包括长上下文能力。
In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. In terms of pre-training, we have scaled the high-quality pre-training datasets from the previous 7 trillion tokens to 18 trillion tokens. This provides a strong foundation for common sense, expert knowledge, and reasoning capabilities. In terms of post-training, we implement intricate supervised finetuning with over 1 million samples, as well as multistage reinforcement learning. Post-training techniques enhance human preference, and notably improve long text generation, structural data analysis, and instruction following. To handle diverse and varied use cases effectively, we present Qwen2.5 LLM series in rich sizes. Open-weight offerings include base and instruction-tuned models, with quantized versions available. In addition, for hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2.5-Turbo and Qwen2.5-Plus, both available from Alibaba Cloud Model Studio. Qwen2.5 has demonstrated top-tier performance on a wide range of benchmarks evaluating language understanding, reasoning, mathematics, coding, human preference alignment, etc. Specifically, the open-weight flagship Qwen2.5-72B-Instruct outperforms a number of open and proprietary models and demonstrates competitive performance to the state-of-the-art open-weight model, Llama-3-405B-Instruct, which is around 5 times larger. Qwen2.5-Turbo and Qwen2.5-Plus offer superior cost-effectiveness while performing competitively against GPT-4o-mini and GPT-4o respectively. Additionally, as the foundation, Qwen2.5 models have been instrumental in training specialized models such as Qwen2.5-Math, Qwen2.5-Coder, QwQ, and multimodal models.
研究动机与目标
- 展示 Qwen2.5 如何扩展数据和模型规模,以提升推理、编码、数学与指令遵循等能力。
- 描述训练后策略(SFT、离线/在线 RL)如何提升人类偏好对齐与长上下文处理。
- 展示能够支持多样化用例与具成本效益部署的架构、分词器,以及 MoE 的改进。
- 展示开放权重模型在与同期模型及托管 MoE 变体相比的性能,面向企业/API 使用。
提出的方法
- 将预训练数据规模从 7T token 提升到 18T tokens,同时采用经过筛选的数据混合与领域平衡。
- 结合长上下文预训练,扩展 RoPE 基频与上下文长度,达到 32,768 tokens(分阶段训练中的 Turbo 可达 262,144)。
- 使用超过 1M 的监督微调样本,涵盖长序列生成、数学/编码、结构化数据和跨语言数据。
- 应用两阶段强化学习(离线 DPO 风格与在线 GRPO),以优化事实性、指令遵循和安全性。
- 为托管变体使用 MoE 架构(Qwen2.5-Turbo 与 Qwen2.5-Plus),并在 0.5B–72B 尺度范围内发布开放权重的密集模型。
实验结果
研究问题
- RQ1将预训练数据扩展到 18 trillion tokens,对多样化知识领域(理解、编码、数学)的收益有多大?
- RQ2长上下文训练和改进的上下文长度如何影响生成质量与结构化数据处理?
- RQ3多阶段后训练(SFT、离线 RL、在线 RL)是否能提升跨领域的人类偏好对齐与长表述任务性能?
- RQ4开放权重的密集模型与 MoE 变体在通用、数学、编码和多语言任务上如何与同代模型(如 Llama-3、Mixtral)相比?
- RQ5Qwen2.5-Turbo/Plus 与标准开放权重模型在成本与延迟方面有哪些实际权衡?
主要发现
- Qwen2.5-72B-Instruct 开放权重模型在与最先进的开放权重模型相比时表现具有竞争力,尽管规模约大5倍(Llama-3-405B-Instruct)。
- Qwen2.5-Turbo 与 Qwen2.5-Plus 提供卓越的成本效益,同时在各自基线中与 GPT-4o-mini 与 GPT-4o 竞争。
- 预训练数据规模和领域平衡混合有助于在知识、编码和数学等领域提升领域专业能力。
- 长上下文能力(基础模型最高 8K token;Turbo 最大 1M token)显著提升长表述生成与结构化数据处理。
- 训练后使用超过 1M 的 SFT 样本和两阶段 RL(离线+在线)来改进指令遵循、推理和安全对齐。
- Qwen2.5 模型支持广泛的开放权重生态系统,具有多种配置(0.5B–72B)以及用于托管的 MoE 变体。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。