QUICK REVIEW

[论文解读] Learning Perceptual Inference by Contrasting

Chi Zhang, Baoxiong Jia|arXiv (Cornell University)|Nov 29, 2019

Cognitive Science and Mapping被引用 40

一句话总结

CoPINet 引入一种具有置换不变的对比感知框架，并配备推断规则模块以解决 Raven’s Progressive Matrices，在 RAVEN 和 PGM 数据集上实现了最先进的结果。

ABSTRACT

"Thinking in pictures," [1] i.e., spatial-temporal reasoning, effortless and instantaneous for humans, is believed to be a significant ability to perform logical induction and a crucial factor in the intellectual history of technology development. Modern Artificial Intelligence (AI), fueled by massive datasets, deeper models, and mighty computation, has come to a stage where (super-)human-level performances are observed in certain specific tasks. However, current AI's ability in "thinking in pictures" is still far lacking behind. In this work, we study how to improve machines' reasoning ability on one challenging task of this kind: Raven's Progressive Matrices (RPM). Specifically, we borrow the very idea of "contrast effects" from the field of psychology, cognition, and education to design and train a permutation-invariant model. Inspired by cognitive studies, we equip our model with a simple inference module that is jointly trained with the perception backbone. Combining all the elements, we propose the Contrastive Perceptual Inference network (CoPINet) and empirically demonstrate that CoPINet sets the new state-of-the-art for permutation-invariant models on two major datasets. We conclude that spatial-temporal reasoning depends on envisaging the possibilities consistent with the relations between objects and can be solved from pixel-level inputs.

研究动机与目标

推动 RPM 任务在超越纯感知的前提下改进时空与关系推理。
结合显式对比机制以比较候选解并提炼区分特征。
强制置换不变性以防止依赖候选顺序或网格定位。
将简单的感知-推理模块与感知端联合训练，以捕捉隐藏规则。

提出的方法

引入两级对比：模型级对比计算 Contrast(F_{O∪a}) = F_{O∪a} − h(Σ_{a′∈A} F_{O∪a′})，以及相应的对比模块以保持置换不变性。
通过带基线 b(·) 的 Noise-Contrastive Estimation (NCE) 变体在目标层面执行对比，推动正确候选相对于错误候选的潜能；优化一个基于 Sigmoid 的损失（Eq. 8），以偏好更大的边际。
加入一个感知推理分支，结合观测 O 来共同推断隐藏规则 T，建模 p(T|O) 并采样 T̂ 以对最终评分 f(O∪a, T̂) 进行条件化。
通过设计共享编码器和重复的对比+残差块来确保置换不变性，这些块不依赖候选顺序或行/列定位。
描述一个 CoPINet 架构：感知分支包含对比模块和残差块、推理分支带有 (Gumbel-)SoftMax 输出，以及产生在对比目标中使用的负势能的 MLP。

实验结果

研究问题

RQ1显式对比机制是否能提升 RPM 风格的关系推理，超越仅感知的模型？
RQ2置换不变性是否能防止基于候选排序的捷径解并强制对关系的真实推理？
RQ3将感知与简单推理模块联合学习是否能在 RPM 数据集上获得更好的泛化？
RQ4带基线的对比目标相较于标准交叉熵在引导 RPM 推理方面表现如何？

主要发现

CoPINet 在 RAVEN 和 PGM 数据集上实现了置换不变模型中的最先进性能。
在 RAVEN 数据集上，CoPINet 的总体准确率达到 91.42%（相比人类 84.41%），在某些配置中接近人类水平的推理。
在 PGM 数据集上，CoPINet 实现了 56.37% 的总体准确率，优于其他置换不变基线。
消融研究表明对比模块、对比损失和感知推理分支各自有显著贡献，去除对比模块会导致剧烈下降。
减少训练数据量仍能保持强劲性能，在 RAVEN 上以显著更少的示例使 CoPINet 达到接近人类的性能，在 PGM 上保持显著提升。
结果强调置换不变性的重要性，以防止通过位置线索的捷径学习，并促进真正的关系推理。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。