QUICK REVIEW

[論文レビュー] Automated Extraction of Unstructured Post-SBRT Toxicity Data from Radiology Reports Using Large Language Models

Justin Pijanowski, Yakout Mezgueldi|arXiv (Cornell University)|Feb 26, 2026

Topic Modeling被引用数 0

ひとこと要約

研究は prompt-engineered Llama 3.3-70-B-Instruct を用いて SBRT 後の毒性と進行アウトカムを放射線診断報告から抽出することを実証し、臨床データ整備の実行可能な性能を達成。

ABSTRACT

We evaluated the viability of using a Large Language Model (LLM) to extract patient-specific specific toxicity and progression outcomes from unstructured radiology reports. We retrospectively extracted 160 follow-up CT and PET/CT electronic medical record notes for patients treated with lung stereotactic body radiotherapy (SBRT) at our institution from January 2017 through December 2023. Using the Llama 3.3-70-B-Instruct LLM, we engineered prompts to extract four clinical endpoints from each radiology report: locoregional progression, distant progression, radiation-induced fibrosis, and radiation-induced rib fractures. Progression endpoints were classified as yes, no, or maybe, while fibrosis and rib fractures were binary (yes or no). Ground truth labels were defined using two-grader consensus for the 60-note training set, used for prompt development, and a three-grader majority vote for the 100-note test set. LLM performance was evaluated using sensitivity, specificity, and accuracy. As detailed by our evaluation metrics, the strong performance of our methods demonstrates the viability of using prompt-engineered LLMs to extract radiation-toxicities and progression classification from radiology reports.

研究の動機と目的

SBRT 後の非構造化放射線ノートから構造化毒性・進行データの自動抽出を動機づける。
放射線语言を四つの臨床エンドポイント（locoregional progression, distant progression, fibrosis, rib fractures）へ適合させるプロンプトを開発する。
プロンプトを訓練・評価するために多重アノテータの合意でグラウンドトゥルースラベルを作成する。
標準的な評価指標を用いて LLM の性能を評価し、放射線療法毒性監視の実現可能性を評価する。

提案手法

SBRT 患者の 160 件のフォローアップ CT および PET/CT ノートを使用する（2017–2023）。
engineered prompts を用いて Llama 3.3-70-B-Instruct を適用し、各報告から四つのエンドポイントを抽出する。
エンドポイント: locoregional progression, distant progression (yes/no/maybe), fibrosis (yes/no), rib fractures (yes/no)。
訓練セット 60 ノートは二名の grader のコンセンサスで、テストセット 100 ノートは三名 grader の多数決で ground truth を確立する。
感度、特異度、正確度で性能を評価する。

実験結果

リサーチクエスチョン

RQ1prompt-engineered LLM は unstructured radiology reports から SBRT 後の毒性と進行エンドポイントを正確に抽出できるか。
RQ2各エンドポイントについて LLM による抽出の感度、特異度、正確度はどの程度か。
RQ3臨床毒性監視とデータ整備での使用を支える十分な信頼性があるか。

主な発見

LLM ベースの抽出は報告された評価指標に従い、各エンドポイントで高い性能を示した。
グラウンドトゥルースラベリングは堅牢な多グレーダー合意を用いた（訓練は二名グレーダー、評価は三名グレーダー）。
このアプローチは放射線治療の毒性と進行分類を放射線報告から抽出するための prompt-engineered LLM の実用可能性を支持する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。