Skip to main content
QUICK REVIEW

[論文レビュー] Large Language Models Perform Diagnostic Reasoning

Cheng-Kuang Wu, Weilin Chen|arXiv (Cornell University)|Jul 18, 2023
Topic Modeling被引用数 9
ひとこと要約

DR-CoT prompting improves diagnostic accuracy of LLMs for automatic diagnosis by about 15% over standard prompting, with an 18% gain in out-domain settings.

ABSTRACT

We explore the extension of chain-of-thought (CoT) prompting to medical reasoning for the task of automatic diagnosis. Motivated by doctors' underlying reasoning process, we present Diagnostic-Reasoning CoT (DR-CoT). Empirical results demonstrate that by simply prompting large language models trained only on general text corpus with two DR-CoT exemplars, the diagnostic accuracy improves by 15% comparing to standard prompting. Moreover, the gap reaches a pronounced 18% in out-domain settings. Our findings suggest expert-knowledge reasoning in large language models can be elicited through proper promptings.

研究の動機と目的

  • Motivate extending chain-of-thought prompting to medical reasoning for automatic diagnosis.
  • Propose Diagnostic-Reasoning CoT (DR-CoT) to elicit expert-like reasoning in LLMs.
  • Develop a few-shot LLM-based dialogue system for automatic diagnosis.
  • Introduce a language-model-role-playing evaluation framework to simulate patient-doctor interactions.

提案手法

  • Prompt LLMs with a two-shot DR-CoT template to guide evidence gathering and differential diagnosis generation.
  • Augment the instruction to summarize evidence and formulate a differential diagnosis before formulating the next question.
  • Replace standard prompts with a DR-CoT-driven prompt that ties evidence to a ranked differential and next query.
  • Use a non-pipelined, few-shot dialogue setup where the model generates questions and a final diagnosis.
  • Evaluate using a language-model-role-playing framework where the LLM acts as both doctor and patient in self-chat dialogues.
  • Conduct experiments on the DDXPlus dataset with in-domain and out-domain splits.
Figure 3: The initial prompt includes the instruction I , the shots S , and the input D . The generated question $q_{i}$ of the prompted model (i.e., the DSAD) and the answer $a_{i}$ from the patient bot is presented in the remaining text in black.
Figure 3: The initial prompt includes the instruction I , the shots S , and the input D . The generated question $q_{i}$ of the prompted model (i.e., the DSAD) and the answer $a_{i}$ from the patient bot is presented in the remaining text in black.

実験結果

リサーチクエスチョン

  • RQ1Can DR-CoT prompting improve diagnostic accuracy of LLM-based automatic diagnosis compared to standard prompting?
  • RQ2Does DR-CoT generalize to out-domain initial evidences beyond the exemplars?
  • RQ3Does the DR-CoT approach lead to more informative questioning that supports correct diagnoses?
  • RQ4Is a language-model-role-playing evaluation framework a viable proxy for realistic DSAD assessment?

主な発見

  • DR-CoT prompting yields a 15% improvement in diagnostic accuracy over standard prompting.
  • The accuracy improvement with DR-CoT is 18% in out-domain settings.
  • Two-shot exemplars with DR-CoT significantly enhance convergence speed and diagnostic performance.
  • A physician-evaluated human study supports that DR-CoT prompts help the model ask more critical questions.
  • The evaluation framework using role-playing between doctor and patient enables automated, end-to-end assessment.
Large Language Models Perform Diagnostic Reasoning

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。