QUICK REVIEW

[論文レビュー] Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology

Dyke Ferber, Omar S. M. El Nahhas|arXiv (Cornell University)|Apr 6, 2024

Radiomics and Machine Learning in Medical Imaging被引用数 12

ひとこと要約

論文は、大規模言語モデル（GPT-4）を推論エンジンとして活用し、マルチモーダル腫瘍学決定支援のための専門的臨床ツールを統合する自律AIエージェントフレームワークを提示し、専門家を中心とした評価を用いて複雑なGIがん症例で検証した。

ABSTRACT

Multimodal artificial intelligence (AI) systems have the potential to enhance clinical decision-making by interpreting various types of medical data. However, the effectiveness of these models across all medical fields is uncertain. Each discipline presents unique challenges that need to be addressed for optimal performance. This complexity is further increased when attempting to integrate different fields into a single model. Here, we introduce an alternative approach to multimodal medical AI that utilizes the generalist capabilities of a large language model (LLM) as a central reasoning engine. This engine autonomously coordinates and deploys a set of specialized medical AI tools. These tools include text, radiology and histopathology image interpretation, genomic data processing, web searches, and document retrieval from medical guidelines. We validate our system across a series of clinical oncology scenarios that closely resemble typical patient care workflows. We show that the system has a high capability in employing appropriate tools (97%), drawing correct conclusions (93.6%), and providing complete (94%), and helpful (89.2%) recommendations for individual patient cases while consistently referencing relevant literature (82.5%) upon instruction. This work provides evidence that LLMs can effectively plan and execute domain-specific models to retrieve or synthesize new information when used as autonomous agents. This enables them to function as specialist, patient-tailored clinical assistants. It also simplifies regulatory compliance by allowing each component tool to be individually validated and approved. We believe, that our work can serve as a proof-of-concept for more advanced LLM-agents in the medical domain.

研究の動機と目的

腫瘍学におけるドメイン特化のマルチモーダルAIの必要性を動機づけ、一般主義モデルの限界に対処する。
専門ツールを指揮する推論エンジンとしてLLMを用いるモジュラーAIエージェントフレームワークを提案する。
厳密な文書取得を伴う厳選された腫瘍学知識ベースでエージェントを支える。
専門家による人間の評価を伴う現実的なマルチモーダルGI腫瘍症例でエージェントを評価する。
モノリシックなモデルに対するモジュラーでツール個別の検証の規制・保守上の利点を示す。

提案手法

推論コアとしてGPT-4を中心とした自律AIエージェントを構築する。
専門ツールを統合する: 放射線視覚（GPT-4V）、病理組織学遺伝子/変異予測、OncoKB、ウェブ検索、電卓、医用画像セグメンテーション（MedSAM）。
埋め込みとコサイン類似度検索を用いて約6,800件の腫瘍学文書からRetrieval-Augmented Generation (RAG) 知識ベースを構築する。
複数段階の計画とサブクエリを生成する; 関連 passages を取得する; 各主張に対して出典を引用する。
ツール使用、回答の完全性、事実正確性、有用性、引用の整合性を、11の合成ケースを対象にブラインド expert review で評価する。
認識された制限には単一断層の放射線画像、GPT-4Vの制限、フォローアップ質問の欠如、腫瘍学的焦点が挙げられる; 将来のモジュラー拡張を提案する。

実験結果

リサーチクエスチョン

RQ1LLMベースのエージェントは、腫瘍学の意思決定を支援するために一連の専門的医療ツールを自律的に計画・実行できるか。
RQ2ツールを活用した推論は、マルチモーダル腫瘍学シナリオにおける臨床推奨の正確性・完全性・エビデンス根拠を改善するか。
RQ3Retrieval-Augmented Generationとモジュラー工具が最新のガイドラインや文献とモデル出力をどれだけ整合させられるか。
RQ4モジュラーでツール特化型のアーキテクチャの規制・保守上の利点は、モノリシックな汎用モデルと比べて何か。

主な発見

エージェントはケースを通じて一貫してツールを呼び出し、患者あたり平均3回のツール使用、1件の報告された失敗と1件の省略。
病理組織学ベースの変異およびMSIステータス予測は、TCGAデータを含む7ケースで高い精度を達成。
GPT-4Vは、時折の省略や過剰な詳細にもかかわらず、正確な疾病経過評価に向けた臨床判断を導いた。
モデルの網羅性は医療専門家が評価した67の必須声明のうち94%に達した。
モデル主張の全体的事実正確性は93.6%、4.3%は不正確、2.1%は潜在的に有害な回答。
引用が出典と一致するのは82.5%、無関係は15.2%、矛盾は2.3%、幻覚は限定的。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。