QUICK REVIEW

[论文解读] HyperCLOVA X Technical Report

Kang Min Yoo, Jaegeun Han|arXiv (Cornell University)|Apr 2, 2024

Topic Modeling被引用 6

一句话总结

HyperCLOVA X 是一个以韩语为重点的“大语言模型”家族（HCX-L 和 HCX-S），在韩语、英语和代码数据上进行训练，并进行了指令微调与 RLHF。它展示了强大的韩语能力、与英语的竞争性表现、多语言翻译能力，以及对安全性的谨慎评估。

ABSTRACT

We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.

研究动机与目标

开发一个以韩语为中心的 LLM 家族，使其符合韩语语言和文化，同时保持强大的英语与代码能力。
在平衡的韩英代码数据上进行预训练，并应用指令微调（SFT）与 RLHF 以实现对齐。
在全面的韩文和英文基准上进行评估，以证明双语能力、多语言泛化能力和安全性合规性。
展示包括跨语言推理和跨亚洲语言的机器翻译在内的多语言能力。
讨论安全性、红队演练和负责任的 AI 考量，并以主权 LLM 发展为导向。

提出的方法

带旋转位置嵌入和预归一化的 Transformer 解码器架构。
面向韩语优化的 100,000 词汇表的形态学感知字节级 BPE 分词器。
联合 PSM 与 SPM 预训练以实现填充能力。
监督式微调（SFT）以提升跨领域的指令执行能力。
使用 PPO 的人类反馈强化学习（RLHF），结合奖励模型与 KL 惩罚以对齐人类偏好。
在 NSML 上结合 CLOps 和 MLflow，通过异步、事件驱动的对齐管线实现序列级的非显性训练与对齐；引入 PPO 以降低重复。

Figure 1 : Performance comparison of HyperCLOVA X with other leading open-source LLMs on a wide range of benchmarks, incorporating both Korean and English tests. The largest model in the HyperCLOVA X family is shown. Our evaluation indicates that HyperCLOVA X strongly outperforms all other models ta

实验结果

研究问题

RQ1HyperCLOVA X 在韩语特定基准上的表现如何，与英文基准和多语言基线相比如何？
RQ2韩语与其他语言之间的跨语言能力和翻译表现如何？
RQ3在一种语言上进行指令微调是否会在另一种语言上产生指令跟随能力（跨语言迁移）？
RQ4安全性和对齐程序在减轻有害内容同时保留效用方面有多高效？
RQ5双语/多语言评估能揭示关于主权 LLM 发展与区域 AI 政策的哪些洞见？

主要发现

模型	韩文	英文	CS	事实	数学	代码	有害性	总体
HCX-S	61.73	47.08	76.56	46.88	39.04	37.71	62.08	53.01
HCX-L	72.07	58.25	87.26	56.83	50.91	46.10	67.32	62.68

HCX-L 在全面的韩语基准上对比面向韩语的基线取得领先表现。
HCX-L 在英语任务上的基准表现与最大的 LLaMA 2 模型相当。
HCX 模型表现出强大的跨语言迁移能力和优越的韩英双语能力。
韩语与日语、中文等语言之间的跨语言翻译在非定向语言中达到最先进水平。
安全评估与红队演练结合 SFT/RLHF 与强化技术，支持主权情境下的负责任 AI 部署。

(a) HCX-S trained and tested on OpenOrca.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。