QUICK REVIEW

[论文解读] Toward Global Large Language Models in Medicine

Rui Yang, Huitao Li|arXiv (Cornell University)|Jan 5, 2026

Machine Learning in Healthcare被引用 1

一句话总结

该论文构建 GlobMed（涵盖12种语言的50万条医疗数据集），使用 GlobMed-Bench 对56种大语言模型进行评估，并训练 GlobMed-LLMs（参数量在1.7B–8B之间），在低资源语言上尤其显著提升性能。

ABSTRACT

Despite continuous advances in medical technology, the global distribution of health care resources remains uneven. The development of large language models (LLMs) has transformed the landscape of medicine and holds promise for improving health care quality and expanding access to medical information globally. However, existing LLMs are primarily trained on high-resource languages, limiting their applicability in global medical scenarios. To address this gap, we constructed GlobMed, a large multilingual medical dataset, containing over 500,000 entries spanning 12 languages, including four low-resource languages. Building on this, we established GlobMed-Bench, which systematically assesses 56 state-of-the-art proprietary and open-weight LLMs across multiple multilingual medical tasks, revealing significant performance disparities across languages, particularly for low-resource languages. Additionally, we introduced GlobMed-LLMs, a suite of multilingual medical LLMs trained on GlobMed, with parameters ranging from 1.7B to 8B. GlobMed-LLMs achieved an average performance improvement of over 40% relative to baseline models, with a more than threefold increase in performance on low-resource languages. Together, these resources provide an important foundation for advancing the equitable development and application of LLMs globally, enabling broader language communities to benefit from technological advances.

研究动机与目标

解决全球在医疗语言资源和模型在不同语言上的性能差异问题。
创建覆盖12种语言（含低资源语言）的大型多语言医疗数据集 GlobMed。
在多语言医疗任务上基准现有大语言模型（GlobMed-Bench），揭示按语言的性能差距。
基于 GlobMed 训练多语言医疗大语言模型（GlobMed-LLMs），提升全球医疗AI的可及性。

提出的方法

汇集 GlobMed，覆盖12种语言、超50万条条目，其中包含四种低资源语言。
建立 GlobMed-Bench，系统评估56种最先进的专有与开源权重大语言模型在多语言医疗任务上的表现。
在 GlobMed 上训练 GlobMed-LLMs，模型规模从1.7B到8B参数。
评估 GlobMed-LLMs 相对于基线模型的相对性能提升，重点关注低资源语言。

实验结果

研究问题

RQ1多语言医疗数据覆盖度如何影响各语言的 LLM 性能？
RQ2在多语言医疗任务上，56种 LLM 之间有哪些性能差距？
RQ3在 GlobMed 上训练能否提升 LLM 在低资源语言中的表现？
RQ4GlobMed-LLMs 相对于基线在整体和低资源语言上的提升幅度有多大？

主要发现

GlobMed 覆盖12种语言、超过50万条医疗条目，其中包含四种低资源语言。
GlobMed-Bench 揭示各语言之间存在显著的性能差异，尤其是对低资源语言。
GlobMed-LLMs（1.7B–8B）相对于基线模型的平均性能提升超过40%。
与基线相比，GlobMed-LLMs 在低资源语言上的提升超过三倍。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。