QUICK REVIEW

[论文解读] Cross-modal Memory Networks for Radiology Report Generation

Zhihong Chen, Yaling Shen|arXiv (Cornell University)|Apr 28, 2022

Multimodal Machine Learning Applications被引用 26

一句话总结

本论文提出跨模态记忆网络（CMN），通过在共享记忆中存储跨模态图像-文本对齐，来提升放射科报告生成，在 IU X-Ray 和 MIMIC-CXR 数据集上达到最先进的结果。

ABSTRACT

Medical imaging plays a significant role in clinical practice of medical diagnosis, where the text reports of the images are essential in understanding them and facilitating later treatments. By generating the reports automatically, it is beneficial to help lighten the burden of radiologists and significantly promote clinical automation, which already attracts much attention in applying artificial intelligence to medical domain. Previous studies mainly follow the encoder-decoder paradigm and focus on the aspect of text generation, with few studies considering the importance of cross-modal mappings and explicitly exploit such mappings to facilitate radiology report generation. In this paper, we propose a cross-modal memory networks (CMN) to enhance the encoder-decoder framework for radiology report generation, where a shared memory is designed to record the alignment between images and texts so as to facilitate the interaction and generation across modalities. Experimental results illustrate the effectiveness of our proposed model, where state-of-the-art performance is achieved on two widely used benchmark datasets, i.e., IU X-Ray and MIMIC-CXR. Further analyses also prove that our model is able to better align information from radiology images and texts so as to help generating more accurate reports in terms of clinical indicators.

研究动机与目标

促进自动化放射科报告生成，以缓解放射科医生的工作负担。
明确建模并利用胸部X光图像与报告之间的跨模态对齐。
引入一个基于记忆的媒介，用于存储共享的跨模态信息。
通过记忆驱动的跨模态交互增强编码器-解码器（Transformer）。
在两个基准数据集上展示最先进的性能。

提出的方法

使用基于CNN的视觉提取器获取基于区域的图像特征。
引入一个跨模态记忆网络（CMN），使用共享记忆矩阵编码图像-文本对齐。
通过将输入特征和记忆向量映射到同一空间并选择前K个记忆来执行记忆查询。
通过对视觉和文本输入的变换记忆向量进行加权求和来生成响应。
将记忆响应输入到基于Transformer的编码器-解码器中以生成放射科报告。

实验结果

研究问题

RQ1共享跨模态记忆是否能改善放射科报告生成中的对齐与生成质量？
RQ2在标准放射科基准上，CMN是否优于单模态记忆或无记忆的基线？
RQ3记忆大小和查询参数如何影响生成质量与对齐？
RQ4在编码与解码中联合使用记忆是否比仅在解码阶段使用记忆带来更大的增益？

主要发现

模型	BL-1	BL-2	BL-3	BL-4	MTR	RG-L	P	R	F1
IU X-Ray Base	0.396	0.254	0.179	0.135	-	-	-	-	-
IU X-Ray +mem	0.443	0.270	0.191	0.144	-	-	-	-	-
IU X-Ray +cmn	0.475	0.309	0.222	0.170	-	-	-	-	-
MIMIC-CXR Base	0.314	0.192	0.127	0.090	0.125	0.265	-	-	-
MIMIC-CXR +mem	0.340	0.209	0.140	0.100	0.135	0.273	0.322	0.255	0.261
MIMIC-CXR +cmn	0.353	0.218	0.148	0.106	0.142	0.278	0.334	0.275	0.278

基于CMN的模型在NLG指标上以及在适用时的CE指标上均优于基线。
Base+cmn 在 IU X-Ray 和 MIMIC-CXR 上的评估模型中取得最佳的 BLEU/METEOR/ROUGE-L 分数。
记忆增强的编码+解码比仅在解码阶段使用记忆带来更大增益（Base+cmn > Base+mem）。
记忆规模越大通常有帮助，达到适中的N时最优；超出该点后增益饱和或下降，因为记忆未得到充分更新。
通过定性分析和案例研究，该模型展示了有意义的图像-文本映射与对齐。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。