QUICK REVIEW

[论文解读] PathVQA: 30000+ Questions for Medical Visual Question Answering

Xuehai He, Yichen Zhang|arXiv (Cornell University)|Mar 7, 2020

Multimodal Machine Learning Applications参考文献 33被引用 85

一句话总结

本文提出 PathVQA，一个病理学 VQA 数据集，包含 32,799 对 QA 的问题与答案，来自 4,998 张图像，来自教科书的半自动生成，并提供多种 VQA 模型的基线。

ABSTRACT

Is it possible to develop an "AI Pathologist" to pass the board-certified examination of the American Board of Pathology? To achieve this goal, the first step is to create a visual question answering (VQA) dataset where the AI agent is presented with a pathology image together with a question and is asked to give the correct answer. Our work makes the first attempt to build such a dataset. Different from creating general-domain VQA datasets where the images are widely accessible and there are many crowdsourcing workers available and capable of generating question-answer pairs, developing a medical VQA dataset is much more challenging. First, due to privacy concerns, pathology images are usually not publicly available. Second, only well-trained pathologists can understand pathology images, but they barely have time to help create datasets for AI research. To address these challenges, we resort to pathology textbooks and online digital libraries. We develop a semi-automated pipeline to extract pathology images and captions from textbooks and generate question-answer pairs from captions using natural language processing. We collect 32,799 open-ended questions from 4,998 pathology images where each question is manually checked to ensure correctness. To our best knowledge, this is the first dataset for pathology VQA. Our dataset will be released publicly to promote research in medical VQA.

研究动机与目标

创建一个病理学 VQA 数据集，以便 AI 处理 ABP 风格的病理问题.
利用公开可得的病理教科书和图书馆，绕过隐私和注释瓶颈。
提供基线 VQA 基准测试，使用多种最先进的模型进行比较。

提出的方法

从在线教科书和 PEIR Digital Library 提取病理图像和图注。
使用 NLP 和基于规则的转写器半自动从图注生成问答对。
人工校对并后处理生成的问题，以修正语法和语义。

实验结果

研究问题

RQ1 AI 系统能否从图像及相关图注回答板式病理问题？
RQ2病理学 VQA 数据集有多大、多样，且能够与研究社区共享？
RQ3当前的 VQA 模型在开放式和是非病理问题上的基线表现如何？

主要发现

PathVQA 数据集在 4,998 张图像中包含 32,799 个 QA 对。
存在 7 种问题类别，其中 49.8% 是 yes/no，40.9% 是 what-type 开放式问题。
基线结果显示 yes/no 的准确率在 57.6% 到 68.2% 之间，随方法变化。
开放式 QA 指标（BLEU-1/2/3、Exact match、F1）相对较低（例如 BLEU-1 最高 32.4，Exact match 低至 2.9%）。
Method 1 采用 bilinear attention 和 region proposals，在是/否任务上实现了最佳性能（68.2%）。
Method 3 使用 Faster R-CNN 相较于整体图像特征，在是/否任务上进一步提升到 62.0%。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。