QUICK REVIEW

[論文レビュー] Dynamic Memory Networks for Visual and Textual Question Answering

Caiming Xiong, Stephen Merity|arXiv (Cornell University)|Mar 4, 2016

Multimodal Machine Learning Applications参考文献 3被引用数 593

ひとこと要約

論文は Dynamic Memory Networks (DMN) を拡張し、画像入力モジュールを用いた視覚-question answering を扱い、メモリと入力表現を改善し、支持事実の監督なしで VQA および bAbI-10k で最先端の結果を達成する。

ABSTRACT

Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images. Based on an analysis of the DMN, we propose several improvements to its memory and input modules. Together with these changes we introduce a novel input module for images in order to be able to answer visual questions. Our new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the \babi-10k text question-answering dataset without supporting fact supervision.

研究の動機と目的

ラベル付きの支持事実を必要とせず、視覚的およびテキスト的な質問応答に DMN を拡張する。
テキストと画像の両方に対して、相互作用と全体的コンテキストをより良く可能にする入力表現を改善する。
多回のエピソード推論をよりよくサポートするために、メモリ更新メカニズムを強化する。
VQAデータセットと bAbI-10k テキストQAデータセットの双方で最先端の性能を示す。

提案手法

テキストモジュールに入力融合層を導入して、bi-directional GRU を介して文と文の間の相互作用を可能にする。
画像用の入力モジュールを開発し、画像を14x14の局所領域に分割し、それらをテキスト特徴空間に射影し、領域全体の文脈を得るために領域上で bidirectional GRU を適用する。
標準の DMN アテンションを、アテンションゲートを用いて隠 states を更新するアテンションベースの GRU に置き換える（Eq. 11）。
文脈ベクトル c^t と前のメモリをメモリ更新（Eq. 12）および任意の ReLU ベースの untied 更新（Eq. 13）を通して渡すことで、エピソディック・メモリを更新する。
ソフトアテンションとアテンションベースの GRU の両方を試し、後者を DMN+ に採用する。
bAbI-10k、DAQUAR-ALL、および VQA データセットで訓練・評価し、最先端アプローチと比較する。

実験結果

リサーチクエスチョン

RQ1DMN を、支持事実の注釈なしに視覚的質問応答へ拡張できるか。
RQ2入力モジュールとメモリ更新の改善は、テキストQAとVQAのタスクの両方で一般化するか。
RQ3ソフトアテンションとアテンションベースのGRU の異なるアテンション機構は、DMN+ における推論にどのように影響するか。
RQ4Untied memory weighting は、タスクを横断して性能を助けるか、それとも妨げるか。

主な発見

DMN+ は、ラベル付き支持事実を要求することなく、DAQUAR-ALL および VQA で従来の DMN バリアントより高い精度を達成する。
入力融合層は、遠くの事実/文と画像領域間の相互作用を改善し、テキストと視覚双方の QA 性能を向上させる。
アテンションベースのGRU は、特にテキストQA において、複雑な位置付けや順序推論を必要とする質問の処理を改善する。
Untied memory weights と ReLU メモリ更新は、平均的には追加の利得をもたらすが、いくつかのタスクでは過学習を招く可能性がある。
全体として、DMN+ は VQA および bAbI-10k データセットの両方で最先端の結果を提供し、いくつかのタスクでエンドツーエンドのメモリーネットワークおよびニューラル推論機を上回る。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。