QUICK REVIEW

[论文解读] Unbiased Scene Graph Generation from Biased Training

Kaihua Tang, Yulei Niu|arXiv (Cornell University)|Feb 27, 2020

Multimodal Machine Learning Applications参考文献 71被引用 39

一句话总结

本工作引入基于因果推断的框架，通过计算 Total Direct Effect (TDE) 来去偏场景图生成（SGG）预测中的上下文偏差，同时保留有用的良好偏置，并展示模型无关的适用性，在 Visual Genome 基准上取得显著提升。

ABSTRACT

Today's scene graph generation (SGG) task is still far from practical, mainly due to the severe training bias, e.g., collapsing diverse "human walk on / sit on / lay on beach" into "human on beach". Given such SGG, the down-stream tasks such as VQA can hardly infer better scene structures than merely a bag of objects. However, debiasing in SGG is not trivial because traditional debiasing methods cannot distinguish between the good and bad bias, e.g., good context prior (e.g., "person read book" rather than "eat") and bad long-tailed bias (e.g., "near" dominating "behind / in front of"). In this paper, we present a novel SGG framework based on causal inference but not the conventional likelihood. We first build a causal graph for SGG, and perform traditional biased training with the graph. Then, we propose to draw the counterfactual causality from the trained graph to infer the effect from the bad bias, which should be removed. In particular, we use Total Direct Effect (TDE) as the proposed final predicate score for unbiased SGG. Note that our framework is agnostic to any SGG model and thus can be widely applied in the community who seeks unbiased predictions. By using the proposed Scene Graph Diagnosis toolkit on the SGG benchmark Visual Genome and several prevailing models, we observed significant improvements over the previous state-of-the-art methods.

研究动机与目标

Motivate the need to mitigate biased relationship predictions in SGG due to long-tailed and language biases.
Propose a causal-inference framework that distinguishes good context priors from harmful bias.
Introduce Total Direct Effect (TDE) as the final unbiased predicate score.
Show that TDE-empowered predictions improve over state-of-the-art debiasing methods across multiple SGG models.

提出的方法

Construct a general causal graph for SGG representing content (X), context (Z), and scene (I) influences on predicate Y.
Perform traditional biased training using the causal graph and model parameters.
Define and compute Total Direct Effect (TDE) as Y_x(u) - Y_{\\bar{x},z}(u) to obtain unbiased predictions.
Show TDE is model-agnostic and can be integrated with existing SGG architectures without extra parameters.
Introduce Scene Graph Diagnosis toolkit including bias-sensitive metrics (mean Recall) and Sentence-to-Graph Retrieval (S2GR).

实验结果

研究问题

RQ1How can we isolate and remove the influence of biased context in SGG predictions while preserving useful priors?
RQ2Does a counterfactual-based TDE predictor improve predicate-level and graph-level SGG performance across different models?
RQ3Are debiasing methods that do not distinguish good vs bad bias less effective or non-generalizable to unseen relationships?
RQ4Can a model-agnostic TDE approach improve downstream tasks relying on SGG (e.g., VQA, captioning) by providing more discriminative relations?

主要发现

TDE substantially improves predicate-level predictions across multiple models and fusion strategies compared to biased baselines.
TDE reduces long-tailed bias effects, with improved distribution of performance not solely driven by head predicates.
TE (total effect) and NIE show limited gains, whereas TDE consistently enhances recall on mean Recall@K for RR and ZSRR tasks.
S2GR demonstrates that TDE yields more discriminative and semantically informative relations, improving sentence-to-graph retrieval.
The Scene Graph Diagnosis toolkit validates severe bias in existing models and the effectiveness of TDE across visual-genome benchmarks.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。