[論文レビュー] Large Language Models Perform Diagnostic Reasoning
DR-CoT prompting improves diagnostic accuracy of LLMs for automatic diagnosis by about 15% over standard prompting, with an 18% gain in out-domain settings.
We explore the extension of chain-of-thought (CoT) prompting to medical reasoning for the task of automatic diagnosis. Motivated by doctors' underlying reasoning process, we present Diagnostic-Reasoning CoT (DR-CoT). Empirical results demonstrate that by simply prompting large language models trained only on general text corpus with two DR-CoT exemplars, the diagnostic accuracy improves by 15% comparing to standard prompting. Moreover, the gap reaches a pronounced 18% in out-domain settings. Our findings suggest expert-knowledge reasoning in large language models can be elicited through proper promptings.
研究の動機と目的
- Motivate extending chain-of-thought prompting to medical reasoning for automatic diagnosis.
- Propose Diagnostic-Reasoning CoT (DR-CoT) to elicit expert-like reasoning in LLMs.
- Develop a few-shot LLM-based dialogue system for automatic diagnosis.
- Introduce a language-model-role-playing evaluation framework to simulate patient-doctor interactions.
提案手法
- Prompt LLMs with a two-shot DR-CoT template to guide evidence gathering and differential diagnosis generation.
- Augment the instruction to summarize evidence and formulate a differential diagnosis before formulating the next question.
- Replace standard prompts with a DR-CoT-driven prompt that ties evidence to a ranked differential and next query.
- Use a non-pipelined, few-shot dialogue setup where the model generates questions and a final diagnosis.
- Evaluate using a language-model-role-playing framework where the LLM acts as both doctor and patient in self-chat dialogues.
- Conduct experiments on the DDXPlus dataset with in-domain and out-domain splits.

実験結果
リサーチクエスチョン
- RQ1Can DR-CoT prompting improve diagnostic accuracy of LLM-based automatic diagnosis compared to standard prompting?
- RQ2Does DR-CoT generalize to out-domain initial evidences beyond the exemplars?
- RQ3Does the DR-CoT approach lead to more informative questioning that supports correct diagnoses?
- RQ4Is a language-model-role-playing evaluation framework a viable proxy for realistic DSAD assessment?
主な発見
- DR-CoT prompting yields a 15% improvement in diagnostic accuracy over standard prompting.
- The accuracy improvement with DR-CoT is 18% in out-domain settings.
- Two-shot exemplars with DR-CoT significantly enhance convergence speed and diagnostic performance.
- A physician-evaluated human study supports that DR-CoT prompts help the model ask more critical questions.
- The evaluation framework using role-playing between doctor and patient enables automated, end-to-end assessment.

より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。