QUICK REVIEW

[論文レビュー] A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics

Hong-Yu Zhou, Yizhou Yu|arXiv (Cornell University)|Jun 1, 2023

COVID-19 diagnosis using AI被引用数 11

ひとこと要約

Transformerベースのモデルは、画像、テキスト、検査値、人口統計などの多模态臨床データを統一された表現に統合して診断を支援し、COVID-19の転帰を予測する。画像のみおよび非統一ベースラインを上回る。

ABSTRACT

During the diagnostic process, clinicians leverage multimodal information, such as chief complaints, medical images, and laboratory-test results. Deep-learning models for aiding diagnosis have yet to meet this requirement. Here we report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner. Rather than learning modality-specific features, the model uses embedding layers to convert images and unstructured and structured text into visual tokens and text tokens, and bidirectional blocks with intramodal and intermodal attention to learn a holistic representation of radiographs, the unstructured chief complaint and clinical history, structured clinical information such as laboratory-test results and patient demographic information. The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases (by 12% and 9%, respectively) and in the prediction of adverse clinical outcomes in patients with COVID-19 (by 29% and 7%, respectively). Leveraging unified multimodal Transformer-based models may help streamline triage of patients and facilitate the clinical decision process.

研究の動機と目的

診断時に多模态臨床情報を統合するモデルの必要性を動機づける。
画像、非構造化テキスト、構造化データを統一的に処理するTransformerベースの表現学習モデルを開発する。
画像のみおよび非統一的多模态アプローチより診断性能を改善することを示す。
トリアージと臨床意思決定における潜在的な利益を示す。
放射線写真、主訴、臨床履歴、検査値、人口統計学的情報への適用性を強調する。

提案手法

埋め込み層を用いて画像とテキスト（非構造化および構造化）を視覚トークンとテキストトークンへ変換する。
模倣間および多模态間の注意機構を含む双方向Transformerブロックを用いて全体的な表現を学習する。
統一アーキテクチャ内で放射線写真、主訴、臨床履歴、検査結果、人口統計情報を処理する。
統一された多模态モデルを画像のみおよび非統一多模态ベースラインと比較する。
肺疾患識別とCOVID-19有害転帰予測で評価する。

実験結果

リサーチクエスチョン

RQ1統一的な多模态Transformerモデルは画像のみのモデルより肺疾患の識別で優れているか。
RQ2統一モデルは非統一多模态アプローチと比較してCOVID-19の有害転帰予測を改善できるか。
RQ3複数データモダリティ（画像、テキスト、構造化データ）を統合することは診断意思決定を高めるか。
RQ4内部モーダリティ注意およびモーダリ間注意が臨床的表現の学習に与える影響は何か。

主な発見

統一多模态モデルは肺疾患識別で画像のみモデルを12%上回った。
統一モデルは肺疾患識別で非統一多模态モデルを9%上回った。
COVID-19の有害転帰について、統一モデルは画像のみベースラインを29%改善し、非統一多模态モデルを7%改善した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。