QUICK REVIEW

[论文解读] On Mutual Information Maximization for Representation Learning

Michael Tschannen, Josip Djolonga|arXiv (Cornell University)|Jul 31, 2019

Domain Adaptation and Few-Shot Learning参考文献 49被引用 219

一句话总结

该论文将 MI 最大化作为无监督表征学习的唯一目标进行质疑，指出估计器和架构偏差在学习的表示上起到重要作用，并将这些观点与深度度量学习联系起来。

ABSTRACT

Many recent methods for unsupervised or self-supervised representation learning train feature extractors by maximizing an estimate of the mutual information (MI) between different views of the data. This comes with several immediate problems: For example, MI is notoriously hard to estimate, and using it as an objective for representation learning may lead to highly entangled representations due to its invariance under arbitrary invertible transformations. Nevertheless, these methods have been repeatedly shown to excel in practice. In this paper we argue, and provide empirical evidence, that the success of these methods cannot be attributed to the properties of MI alone, and that they strongly depend on the inductive bias in both the choice of feature extractor architectures and the parametrization of the employed MI estimators. Finally, we establish a connection to deep metric learning and argue that this interpretation may be a plausible explanation for the success of the recently introduced methods.

研究动机与目标

通过信息 theoretic 目标激励无监督表征学习，并评估互信息 (MI) 的作用。
表明最大化 MI 的界限可能会使编码器偏向于不期望的表示。
证明估计器选择和编码器架构在下游表现中具有强烈影响。
通过将基于 MI 的方法与深度度量学习和三元组损失联系起来，提供一种替代解释。

提出的方法

将表征学习表述为在数据的两个视图之间最大化 MI 下界，使用 InfoNCE 和 NWJ 等估计器。
实验使用可逆和不可逆的编码器，以观察 MI 最大化如何影响下游任务。
改变 critic 架构（双线性、可分离、MLP）以研究它们对学习表示的影响。
在匹配 MI 下界以隔离架构效应时，比较编码器架构（MLP 与 ConvNet）。
分析 InfoNCE 和 NWJ 中负采样的作用及其对 MI 估计和性能的影响。
将基于 MI 的目标与基于三元组的度量学习损失联系起来，以重新解释结果。

实验结果

研究问题

RQ1通过常见估计器最大化 MI 是否能可靠地产生对下游任务有用的表征？
RQ2编码器架构和估计器选择如何偏置学习到的表征？
RQ3在基于 MI 的表征学习中，评判者架构和负采样的作用是什么？
RQ4通过深度度量学习原理，是否能更好地解释 MI 基方法的观测到的成功？
RQ5在何种条件下更宽松的 MI 界限能产生更好的表示？

主要发现

MI 最大化并不保证好的表示；某些可逆编码器在最大化 MI 时的下游性能甚至比原始像素更差。
如 InfoNCE 和 NWJ 等估计器将编码器偏向难以反转或条件数差的映射，影响表示。
更高容量的 critic 可以收紧 MI 界限，但却损害下游表现，而更简单的 critic（双线性/可分离）则可能改善。
在达到相同的 MI 界限时，编码器架构通常比特定的 MI 估计器对效果的影响更大。
通过三元组损失的度量学习视角提供了对经验成功的替代解释，质疑 MI 作为目标的首要地位。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。