QUICK REVIEW

[論文レビュー] BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT

Yirong Chen, Zhenyu Wang|arXiv (Cornell University)|Oct 24, 2023

Topic Modeling被引用数 22

ひとこと要約

本論文は、広範な多ターン健康対話コーパス上でファインチューニングされた健康L L M BianQueを導入し、積極的な質問（CoQ）と健康提案のバランスを取ることで、複数のベンチマークでベースラインを上回る。

ABSTRACT

Large language models (LLMs) have performed well in providing general and extensive health suggestions in single-turn conversations, exemplified by systems such as ChatGPT, ChatGLM, ChatDoctor, DoctorGLM, and etc. However, the limited information provided by users during single turn results in inadequate personalization and targeting of the generated suggestions, which requires users to independently select the useful part. It is mainly caused by the missing ability to engage in multi-turn questioning. In real-world medical consultations, doctors usually employ a series of iterative inquiries to comprehend the patient's condition thoroughly, enabling them to provide effective and personalized suggestions subsequently, which can be defined as chain of questioning (CoQ) for LLMs. To improve the CoQ of LLMs, we propose BianQue, a ChatGLM-based LLM finetuned with the self-constructed health conversation dataset BianQueCorpus that is consist of multiple turns of questioning and health suggestions polished by ChatGPT. Experimental results demonstrate that the proposed BianQue can simultaneously balance the capabilities of both questioning and health suggestions, which will help promote the research and application of LLMs in the field of proactive health.

研究の動機と目的

健康L L Mにおける個別化を改善するための積極的な多ターン質問の必要性を動機づける。
バランスのとれた質問と提案を備えた大規模な多ターン健康対話コーパス（BianQueCorpus）を提案する。
BianQueCorpus上でファインチューニングしたChatGLMベースのLLM、BianQueを開発しCoQと提案品質を向上させる。
中国語健康対話ベンチマークで複数のベースラインと比較してBianQueを評価し、Proactive Questioning Ability（PQA）指標を導入する。

提案手法

実世界の多ターン健康対話から自動クリーニングと医師提案のChatGPTベースの磨きを組み込んでBianQueCorpusを構築する。
患者と医師のターンを強調した特定の入力/出力対話形式でBianQueCorpus上にChatGLM-6Bをファインチューニングする。
訓練中にウォームアップ-デケイ学習率スケジュールを使用し、入力/出力長さの制約を設定する。
標準的なBLEUとROUGE指標に加え、新規のProactive Questioning Ability（PQA）指標を用いて評価する。
モデルサイズ（6.2Bパラメータ）と訓練設定を含む再現性の詳細を提供する。

実験結果

リサーチクエスチョン

RQ1健康L L Mは高品質な健康提案を維持しつつ意味のある多ターン質問（CoQ）を行えるか。
RQ2共創された多ターンコーパスでファインチューニングすると、既存モデルより質問と提案のバランスが改善されるか。
RQ3BianQueは中国語健康対話ベンチマークでChatGPTや他の健康L L Mと比較してどのように性能を示すか。
RQ4PQA指標はCoQパフォーマンスをとらえる上でどの程度影響を与えるか。

主な発見

データセット	モデル	BLEU-1	BLEU-2	BLEU-3	BLEU-4	R-1	R-2	R-L	PQA
MedDialog-CN	ChatGLM-6B	7.28	3.72	2.10	1.23	10.86	0.92	7.43	0.20
MedDialog-CN	DoctorGLM	10.39	5.06	2.94	1.80	13.27	1.04	11.17	0.01
MedDialog-CN	ChatGPT	7.61	3.90	2.21	1.30	11.11	0.96	7.82	0.28
MedDialog-CN	BianQue	11.12	6.50	4.42	3.10	15.55	2.15	12.96	0.53
IMCS-V2	ChatGLM-6B	6.83	3.61	2.12	1.30	10.24	1.03	7.26	0.36
IMCS-V2	DoctorGLM	8.38	4.22	2.52	1.55	11.87	0.95	9.22	0.06
IMCS-V2	ChatGPT	8.46	4.54	2.71	1.70	11.48	1.29	8.97	0.38
IMCS-V2	BianQue	14.50	10.16	7.85	6.23	21.73	6.24	19.09	0.70
CHIP-MDCFNPC	ChatGLM-6B	6.22	3.11	1.81	1.10	9.62	0.85	0.67	0.35
CHIP-MDCFNPC	DoctorGLM	8.59	4.33	2.68	1.71	12.05	1.11	9.68	0.05
CHIP-MDCFNPC	ChatGPT	7.52	3.74	2.20	1.36	10.51	0.97	8.03	0.38
CHIP-MDCFNPC	BianQue	13.41	8.49	6.05	4.42	19.00	3.99	16.56	0.57
MedDG	ChatGLM-6B	4.76	2.31	1.34	0.81	7.35	0.56	5.06	0.47
MedDG	DoctorGLM	6.87	3.47	2.15	1.35	9.62	0.88	7.61	0.09
MedDG	ChatGPT	5.11	2.41	1.38	0.83	7.58	0.50	5.46	0.63
MedDG	BianQue	14.86	10.43	8.09	6.37	21.56	6.46	19.56	0.81

BianQueはMedDialog-CN、IMCS-V2、CHIP-MDCFNPC、MedDGのデータセット全体でベースラインよりBLEU/ROUGEスコアが高い。
BianQueは評価されたデータセット全体でSuperiorなProactive Questioning Ability（PQA）を示す。
BianQueは複数の指標でChatGLM-6B、DoctorGLM、ChatGPTを上回り、BLEU-1/2/3/4およびROUGE-Lで顕著な向上を示す。
データセットとモデルは、質問と提案のバランスを可能にし、CoQ能力の向上を提案品質の犠牲なく示唆する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。