QUICK REVIEW

[论文解读] RUBi: Reducing Unimodal Biases in Visual Question Answering

Rémi Cadène, Corentin Dancette|arXiv (Cornell University)|Jun 24, 2019

Multimodal Machine Learning Applications参考文献 48被引用 205

一句话总结

RUBi 在 VQA 训练期间引入仅基于问题的分支以降低单模态偏差，从而在偏置数据集如 VQA-CP v2 上提升鲁棒性。

ABSTRACT

Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image. It implicitly forces the VQA model to use the two input modalities instead of relying on statistical regularities between the question and the answer. We leverage a question-only model that captures the language biases by identifying when these unwanted regularities are used. It prevents the base VQA model from learning them by influencing its predictions. This leads to dynamically adjusting the loss in order to compensate for biases. We validate our contributions by surpassing the current state-of-the-art results on VQA-CP v2. This dataset is specifically designed to assess the robustness of VQA models when exposed to different question biases at test time than what was seen during training. Our code is available: github.com/cdancette/rubi.bootstrap.pytorch

研究动机与目标

激发需要减少问句模态偏差的原因，这些偏差使 VQA 模型能够忽略图像。
提出一种学习策略（RUBi），在训练过程中对偏差样本进行降权。
展示 RUBi 是模型无关的，能够在不同架构上提升性能。
在带偏见基准测试上展示鲁棒性提升，同时保持与 VQA-v2 相当的结果。

提出的方法

在训练期间将一个仅包含问题的分支附加到一个基础的 VQA 模型，以捕捉语言偏置。
从问题仅分支计算一个掩码，在损失计算之前将其乘法应用于 VQA 输出。
联合优化基础 VQA 模型参数和问题仅分支，使用两个损失：L_QM（主损失）和 L_QO（仅问题损失）。
训练结束后移除问题仅分支；在推理阶段使用基础 VQA 模型。
证明与 SAN 和 UpDn 等架构的兼容性，并在 VQA-CP v2 上报告改进。

实验结果

研究问题

RQ1在训练过程中可以捕获并缓解多少问题仅偏差以提升 VQA 的鲁棒性？
RQ2RUBi 是否在偏置数据集与标准数据集上，对不同的 VQA 架构都能提升性能？
RQ3所提出的掩码策略对学习动力学和偏差减少的影响是什么？
RQ4减少单模态偏差在多大程度上影响标准 VQA-v2 的性能？

主要发现

RUBi 在 VQA-CP v2 上达到平均总体准确率 47.11%，比之前的最先进方法提升了 +5.94 点。
RUBi 在架构上带来提升：SAN (+11.73) 和 UpDn (+4.5)，相较于基线偏差减少方法。
在 VQA-CP v2 上，RUBi 比基线高出 +8.65 点，并远超如 GVQA 等偏差聚焦方法。
RUBi 在 VQA-v2 上保持了有竞争力的性能，同时在 VQA-CP v2 上获得巨大提升，表明对问题偏置具有鲁棒性，而不会严重影响标准准确率。
消融实验证实了 L_QO 和掩码方法在偏差减少中的必要性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。