QUICK REVIEW

[论文解读] DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task

Honglin Xiong, Sheng Wang|arXiv (Cornell University)|Apr 3, 2023

Artificial Intelligence in Healthcare and Education被引用 71

一句话总结

本文展示了将双语中文医疗对话模型（ChatGLM-6B）通过 LoRA 等高效技术微调为 DoctorGLM，在预算有限的硬件条件下实现面向医疗领域的可负担得起的微调。它为多语言医疗领域的大语言模型提供了一个低成本的管线，并分享了实际结果与局限性。

ABSTRACT

The recent progress of large language models (LLMs), including ChatGPT and GPT-4, in comprehending and responding to human instructions has been remarkable. Nevertheless, these models typically perform better in English and have not been explicitly trained for the medical domain, resulting in suboptimal precision in diagnoses, drug recommendations, and other medical advice. Additionally, training and deploying a dialogue model is still believed to be impossible for hospitals, hindering the promotion of LLMs. To tackle these challenges, we have collected databases of medical dialogues in Chinese with ChatGPT's help and adopted several techniques to train an easy-deploy LLM. Remarkably, we were able to fine-tune the ChatGLM-6B on a single A100 80G in 13 hours, which means having a healthcare-purpose LLM can be very affordable. DoctorGLM is currently an early-stage engineering attempt and contain various mistakes. We are sharing it with the broader community to invite feedback and suggestions to improve its healthcare-focused capabilities: https://github.com/xionghonglin/DoctorGLM.

研究动机与目标

推动开发面向医疗领域和非英语语言的语言模型。
描述一个面向中文医疗对话模型微调的低成本端到端管线。
展示在负担得起的硬件上实现面向医疗的大型语言模型的技术。

提出的方法

使用 A100 80G GPU，在中文医疗对话数据上用 LoRA 对 ChatGLM-6B 进行微调。
通过 ChatGPT 将英文医疗数据集翻译为中文，并使用基于 BART 的模型进行蒸馏。
引入一个利用疾病知识（以 Merck Manual 为来源）的提示设计模块来引导回答。
比较 LoRA 与 P-tuning V2 在参数高效微调方面的表现。
使用 top-p 和温度设置评估生成，以控制输出多样性。

Figure 1 : Overview of DoctorGLM fine-tuning and inference pipeline.

实验结果

研究问题

RQ1是否可以使用资源高效的方法在中文环境下对面向医疗的大型语言模型进行有效微调？
RQ2在内部数据上训练一个中文医疗对话模型所需的硬件与时间成本是多少？
RQ3在医疗领域，LoRA 与 P-tuning V2 在微调效率与性能方面的比较如何？
RQ4提示设计模块在提升医疗回答的可靠性和准确性方面扮演怎样的角色？
RQ5这类模型在医院级别部署中的实际局限性与部署考虑因素有哪些？

主要发现

在单个 A100 80G GPU 上，使用 LoRA 对 DoctorGLM 进行中文医疗对话微调，13 小时内即可完成。
在描述的设置下，对 100,000 对问答的微调成本约为 18.75 USD。
推理可以在大约 13 GB 内存的消费级 GPU 上进行，尽管存在部署约束。
LoRA 与 P-tuning V2 提供相当的性能，但在参数效率上有不同的权衡。
作者承认多项技术局限性，并将其确认为一个早期阶段的工程工作。

Figure 2 : The implementation of large-scale translation. A tiny and high-quality dataset is built through ChatGPT. The collected dataset serves as a fine-tuning set for a pre-trained language model, enabling it to perform specialized machine translation.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。