QUICK REVIEW

[論文レビュー] PathVQA: 30000+ Questions for Medical Visual Question Answering

Xuehai He, Yichen Zhang|arXiv (Cornell University)|Mar 7, 2020

Multimodal Machine Learning Applications参考文献 33被引用数 85

ひとこと要約

論文は PathVQA を紹介します。PathVQA は 32,799 の QA ペアを持つ病理学の VQA データセットで、4,998 枚の画像から半自動的に生成され、複数の VQA モデルによるベースラインを提供します。

ABSTRACT

Is it possible to develop an "AI Pathologist" to pass the board-certified examination of the American Board of Pathology? To achieve this goal, the first step is to create a visual question answering (VQA) dataset where the AI agent is presented with a pathology image together with a question and is asked to give the correct answer. Our work makes the first attempt to build such a dataset. Different from creating general-domain VQA datasets where the images are widely accessible and there are many crowdsourcing workers available and capable of generating question-answer pairs, developing a medical VQA dataset is much more challenging. First, due to privacy concerns, pathology images are usually not publicly available. Second, only well-trained pathologists can understand pathology images, but they barely have time to help create datasets for AI research. To address these challenges, we resort to pathology textbooks and online digital libraries. We develop a semi-automated pipeline to extract pathology images and captions from textbooks and generate question-answer pairs from captions using natural language processing. We collect 32,799 open-ended questions from 4,998 pathology images where each question is manually checked to ensure correctness. To our best knowledge, this is the first dataset for pathology VQA. Our dataset will be released publicly to promote research in medical VQA.

研究の動機と目的

ABP スタイルの病理質問に対応できる AI を実現する病理学 VQA データセットを作成する。
公開されている病理学の教科書やライブラリを活用して、プライバシーとアノテーションのボトルネックを回避する。
比較のために複数の最先端モデルを用いたベースライン VQA ベンチマークを提供する。

提案手法

オンライン教科書と PEIR Digital Library から病理画像とキャプションを抽出する。
キャプションから NLP とルールベースのトランスデューサを用いて質問–回答ペアを半自動生成する。
生成された質問の文法と意味を修正するために手動で校正・後処理を行う。

実験結果

リサーチクエスチョン

RQ1AI システムは画像と関連キャプションからのボードスタイルの病理問題に答えることができるか。
RQ2研究コミュニティと共有可能な病理 VQA データセットはどれくらい大規模で多様か。
RQ3現在の VQA モデルは open-ended および yes/no の病理質問でどの程度のベースライン性能を達成しているか。

主な発見

PathVQA データセットは 4,998 枚の画像に対して 32,799 の QA ペアを含む。
7 個の質問カテゴリがあり、49.8% が yes/no、40.9% が what-type のオープンエンド質問。
ベースライン結果は yes/no の精度が方法間で57.6%から68.2%の範囲であることを示す。
オープンエンド QA 指標（BLEU-1/2/3、Exact match、F1）は相対的に低い（例：BLEU-1 は最大 32.4、Exact match は 2.9% まで低い）。
双線形アテンションと領域提案を用いた方法1 が最も良い yes/no 性能を達成（68.2%）。
Faster R-CNN を用いた方法3 は holistic な画像特徴を上回りさらに改善（yes/no で 62.0%）。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。