QUICK REVIEW

[论文解读] Counterfactual VQA: A Cause-Effect Look at Language Bias

Yulei Niu, Kaihua Tang|arXiv (Cornell University)|Jun 8, 2020

Multimodal Machine Learning Applications参考文献 68被引用 23

一句话总结

本文提出 CF-VQA，一种反事实推理框架，通过建模问题对答案的直接因果效应并从总效应中减去该效应，从而减轻视觉问答（VQA）中的语言偏差。该方法在无需数据增强的情况下，在 VQA-CP 上实现了最先进性能，适用于多种主干网络和融合策略，同时在平衡基准上保持了鲁棒性。

ABSTRACT

VQA models may tend to rely on language bias as a shortcut and thus fail to sufficiently learn the multi-modal knowledge from both vision and language. Recent debiasing methods proposed to exclude the language prior during inference. However, they fail to disentangle the "good" language context and "bad" language bias from the whole. In this paper, we investigate how to mitigate language bias in VQA. Motivated by causal effects, we proposed a novel counterfactual inference framework, which enables us to capture the language bias as the direct causal effect of questions on answers and reduce the language bias by subtracting the direct language effect from the total causal effect. Experiments demonstrate that our proposed counterfactual inference framework 1) is general to various VQA backbones and fusion strategies, 2) achieves competitive performance on the language-bias sensitive VQA-CP dataset while performs robustly on the balanced VQA v2 dataset without any augmented data. The code is available at https://github.com/yuleiniu/cfvqa.

研究动机与目标

为解决依赖虚假语言相关性而非多模态推理的 VQA 模型中存在的语言偏差挑战。
将‘良好’语言上下文与‘不良’语言偏差分离，而现有去偏方法未能实现此分离。
开发一种无需数据增强或架构修改即可减少语言偏差的可泛化推理框架。
在因果推理框架下统一现有基于语言先验的方法，实现仅需最小修改即可提升性能。

提出的方法

使用反事实推理，将语言偏差建模为问题对答案的直接因果效应。
通过常规 VQA（同时使用视觉和语言输入）估计总因果效应。
通过反事实 VQA（屏蔽视觉输入，隔离仅问题的影响）估计纯语言效应。
通过从总效应中减去直接语言效应，计算去偏推理。
训练时采用包含视觉-语言、仅语言和仅视觉分支的集成模型。
推理时仅使用视觉-语言分支，并通过减去估计的直接效应实现偏差校正。

实验结果

研究问题

RQ1如何有效将 VQA 中的语言偏差与有用的语言上下文分离？
RQ2反事实推理框架是否可在无需数据增强的情况下减少语言偏差？
RQ3所提方法在不同 VQA 架构和融合策略下的泛化能力如何？
RQ4能否在因果推理框架下统一并改进现有基于语言先验的方法？

主要发现

在 VQA-CP v1 测试集上，CF-VQA 使用 SUM 策略达到 52.87% 的准确率，比基线 RUBi 提高 7.5%。
在 VQA-CP v2 上，CF-VQA（SUM）达到 52.73% 的准确率，在域内设置下超过 RandImg 超过 3%。
该方法在多种主干网络（SAN、UpDn、S-MRL）和融合策略（HM、SUM）上均表现出泛化能力，持续提升性能。
CF-VQA 仅通过增加一个可学习参数，便使 RUBi 提升 7.5%，展现出强大的兼容性和增强潜力。
消融实验表明，CF-VQA 显著降低了语言偏差，同时保留了视觉理解能力，所有模型变体均表现出一致提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。