QUICK REVIEW

[論文レビュー] Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain

Aryo Pradipta Gema, Pasquale Minervini|arXiv (Cornell University)|Jul 6, 2023

Topic Modeling被引用数 9

ひとこと要約

この論文は Clinical LLaMA-LoRA と Downstream LLaMA-LoRA を提案する、LLaMA を臨床タスクに適応させるための2段階 PEFT フレームワークである。LoRA ベースのアダプターにより計算資源を大幅に削減しつつ、臨床 NLP タスクで現在最高水準の macro AUROC を達成する。

ABSTRACT

Adapting pretrained language models to novel domains, such as clinical applications, traditionally involves retraining their entire set of parameters. Parameter-Efficient Fine-Tuning (PEFT) techniques for fine-tuning language models significantly reduce computational requirements by selectively fine-tuning small subsets of parameters. In this study, we propose a two-step PEFT framework and evaluate it in the clinical domain. Our approach combines a specialised PEFT adapter layer designed for clinical domain adaptation with another adapter specialised for downstream tasks. We evaluate the framework on multiple clinical outcome prediction datasets, comparing it to clinically trained language models. Our framework achieves a better AUROC score averaged across all clinical downstream tasks compared to clinical language models. In particular, we observe large improvements of 4-5% AUROC in large-scale multilabel classification tasks, such as diagnoses and procedures classification. To our knowledge, this study is the first to provide an extensive empirical analysis of the interplay between PEFT techniques and domain adaptation in an important real-world domain of clinical applications.

研究の動機と目的

全 Fine-tuning の高コストを考慮し、臨床データへの大規模言語モデルの効率的なドメイン適応の必要性を動機づける。
臨床ドメイン適応のための LoRA ベース・アダプター（Clinical LLaMA-LoRA）を提案する。
Clinical LLaMA-LoRA の上に構築された Downstream LLaMA-LoRA を導入する。
複数の臨床下流タスクでフレームワークを評価し、臨床的に訓練された LLMs およびベースラインと比較する。
従来法に対する計算効率と性能改善を示す。

提案手法

MIMIC-IV の臨床ノートに対する LLaMA および PMC-LLaMA のドメイン適応前学習を PEFT 手法（LoRA、Adaptation Prompt、Prefix Tuning、Prompt Tuning、P-tuning）で実施；困惑度を用いて最良技術（LoRA）を選択する。
2段階のアダプタ・フレームワーク：まず Clinical LLaMA-LoRA（LLaMA/PMC-LLaMA 上のドメイン適応済み LoRA）を訓練、次に臨床アダプターの上に Downstream LLaMA-LoRA を訓練する。
臨床下流の5タスクで下流評価を実施：Prolonged Mechanical Ventilation (PMV)、In-hospital Mortality (MOR)、Length of Stay (LOS)、Diagnoses (DIAG)、Procedures (PROC)；AUROC および macro-average AUROC を報告。
Bio+ClinicalBERT、BlueBERT、CORe、UmlsBERT および LoRA 搭載ベースラインと比較し、凍結 vs 訓練可能な臨床アダプターを分析する。
LoRA ベースのアダプターによりパラメータ効率（訓練可能パラメータ 0.02%-0.24%）が大幅に高く、単一の A100-80GB GPU で1エポックあたり <24 時間であることを強調する。

Figure 1: An illustration of the proposed two-step PEFT framework. Clinical LLaMA-LoRA fine-tunes the pretrained LLaMA to the clinical domain. Downstream LLaMA-LoRA further fine-tunes the domain-adapted model to downstream clinical tasks.

実験結果

リサーチクエスチョン

RQ1LoRA は臨床下流タスクのために LLaMA/PMC-LLaMA をファインチューニングした際に AUROC を改善できるか。
RQ2Downstream LLaMA-LoRA を Clinical LLaMA-LoRA の上に組み合わせるとさらに性能が向上するか。
RQ3Clinical LLaMA-LoRA を装備した LLaMA/PMC-LLaMA は臨床的に訓練された LLM と競合するか、または優れているか。
RQ4Clinical LLaMA-LoRA が凍結か訓練可能かによってダウンストリームのファインチューニング効果は変わるか。
RQ5LoRA がすでに臨床的に訓練された LM（例：BlueBERT、UmlsBERT）にもダウンストリームタスクで効果をもたらすか。

主な発見

LoRA 搭載の LLaMA はタスク全体の macro-average AUROC が 71.62%、LoRA 搭載の PMC-LLaMA は 72.71%。
訓練可能な Clinical LLaMA-LoRA を用いた LLaMA は 70.85% の macro AUROC、訓練可能な Clinical LLaMA-LoRA を用いた PMC-LLaMA は 72.23%。
訓練可能な Clinical LLaMA-LoRA を上に Downstream LLaMA-LoRA を追加すると macro-average AUROC が 72.81% に達する。
訓練可能な BlueBERT の LoRA 搭載は BlueBERT の完全ファインチューニングを上回り、macro-averaged AUROC は 71.56% 対 69.59%。
Clinical LLaMA-LoRA（凍結）は限られた改善しか示さず（macro AUROC 61.58%）、下流アダプターの訓練が改善には不可欠であることを示す。
臨床的に訓練された LM と比較して、LoRA アダプターを用いた LLaMA は競争力のある macro-AUROC を達成し、一部の診断・手技予測でいくつかのベースラインを上回る。

Figure 2: Frameworks of domain-adaptive and downstream fine-tuning to adapt a pretrained LLM from the general domain to the clinical domain. As opposed to a full fine-tuning process which can be prohibitively expensive (left), our approach leverages PEFT techniques to introduce a clinically-speciali

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。