QUICK REVIEW

[논문 리뷰] Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology

Dyke Ferber, Omar S. M. El Nahhas|arXiv (Cornell University)|2024. 04. 06.

Radiomics and Machine Learning in Medical Imaging인용 수 12

한 줄 요약

본 논문은 대형 언어 모델(GPT-4)을 사고 엔진으로 활용하여 다 modality 임상 도구를 조정하는 자율 AI 에이전트 프레임워크를 제시하고, 전문가 중심 평가가 포함된 복잡한 GI 암 사례에서 검증된 다중 모달 종양학 의사결정 지원을 다룬다.

ABSTRACT

Multimodal artificial intelligence (AI) systems have the potential to enhance clinical decision-making by interpreting various types of medical data. However, the effectiveness of these models across all medical fields is uncertain. Each discipline presents unique challenges that need to be addressed for optimal performance. This complexity is further increased when attempting to integrate different fields into a single model. Here, we introduce an alternative approach to multimodal medical AI that utilizes the generalist capabilities of a large language model (LLM) as a central reasoning engine. This engine autonomously coordinates and deploys a set of specialized medical AI tools. These tools include text, radiology and histopathology image interpretation, genomic data processing, web searches, and document retrieval from medical guidelines. We validate our system across a series of clinical oncology scenarios that closely resemble typical patient care workflows. We show that the system has a high capability in employing appropriate tools (97%), drawing correct conclusions (93.6%), and providing complete (94%), and helpful (89.2%) recommendations for individual patient cases while consistently referencing relevant literature (82.5%) upon instruction. This work provides evidence that LLMs can effectively plan and execute domain-specific models to retrieve or synthesize new information when used as autonomous agents. This enables them to function as specialist, patient-tailored clinical assistants. It also simplifies regulatory compliance by allowing each component tool to be individually validated and approved. We believe, that our work can serve as a proof-of-concept for more advanced LLM-agents in the medical domain.

연구 동기 및 목표

종양학에서 도메인 특화 다모달 AI의 필요성을 제고하고 일반 모델의 한계를 다룬다.
자세한 도구를 조정하기 위해 LLM을 사고 엔진으로 활용하는 모듈식 AI 에이전트 프레임워크를 제안한다.
정제된 종양학 지식 기반과 엄격한 문서 검색 체계로 에이전트를 뒷받침한다.
전문가의 인간 평가를 포함한 현실적인 다모달 GI 종양학 사례로 에이전트를 평가한다.
단일 모형에 비해 모듈식 도구별 검증의 규제 및 유지 관리상의 이점을 입증한다.

제안 방법

GPT-4를 사고 핵심으로 삼은 자율 AI 에이전트를 구축한다.
전문의 도구들을 통합한다: 방사선 시각화(GPT-4V), 조직병리 유전자/변이 예측기, OncoKB, 웹 검색, 계산기, 의료 영상 분할(MedSAM).
임베딩과 코사인 유사도 검색을 사용하여 ~6,800건의 종양학 문서로부터 RAG(검색 증강 생성) 지식 기반을 구축한다.
다단계 계획과 하위 질의를 생성하고; 관련 구절을 검색하며; 각 주장에 대해 출처를 인용한다.
가려진 전문가 평가를 통해 11개의 합성 사례에 걸쳐 도구 사용, 답변의 완전성, 사실 정확성, 도움 여부, 인용 정합성을 평가한다.
단일 슬라이스 방사선학, GPT-4V 한계, 후속 질문 부재, 종양학 중심 등 한계를 인정하고 모듈식 향후 확장을 제안한다.

실험 결과

연구 질문

RQ1LLM 기반 에이전트가 자율적으로 전문 의료 도구의 시퀀스를 계획하고 실행하여 종양학 의사결정을 지원할 수 있는가?
RQ2도구를 활용한 추론이 다모달 종양학 시나리오에서 임상 권고의 정확성, 완전성, 증거 기반에 얼마나 기여하는가?
RQ3검색 증강 생성(RAG)과 모듈식 도구가 모델 출력과 최신 지침 및 문헌을 얼마나 잘 정렬시킬 수 있는가?
RQ4모듈식 도구 특정 아키텍처의 규제 및 유지 관리상의 이점은 단일 모놀리식 일반 모델보다 어떤가?

주요 결과

에이전트는 사례 전반에 걸쳐 일관되게 도구를 호출했으며 환자당 평균 세 번의 도구 사용과 함께 한 건의 실패와 한 건의 누락이 보고되었습니다.
TCGA 데이터를 포함한 일곱 사례에서 조직병리 기반 변이 및 MSI 상태 예측이 높은 정확도를 달성했습니다.
GPT-4V가 때때로운 생략이나 불필요한 세부 정보에도 불구하고 정확한 질병 경과 평가를 향해 임상 의사결정을 이끌었다.
모델 완전성은 의료 전문가들이 평가한 67개의 필수 진술에서 94%에 도달했습니다.
모델 주장 전체의 사실 정확도는 93.6%였으며 4.3%는 잘못되었고 2.1%는 잠재적으로 해로울 수 있는 응답이었다.
출처에 맞춘 인용은 참조의 82.5%를 차지했고 15.2%는 관련되지 않으며 2.3%는 상충되었고; 환각은 제한적이었다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.