QUICK REVIEW

[论文解读] Unsupervised Learning of Invariant Representations in Hierarchical Architectures

Fabio Anselmi, Joel Z. Leibo|arXiv (Cornell University)|Nov 17, 2013

Image Retrieval and Classification Techniques参考文献 56被引用 65

一句话总结

本文提出了一种基于Hubel-Wiesel类模块的分层无监督学习框架，可自动学习视觉物体识别中具有低样本复杂度的不变、判别性表征。通过计算图像块与学习模板之间点积分布的池化不变签名，该架构实现了平移、缩放和姿态不变性，同时保持了判别能力，从而实现从极少标注样本中进行识别——这一机制与灵长类动物腹侧视觉通路的原理相一致。

ABSTRACT

The present phase of Machine Learning is characterized by supervised learning algorithms relying on large sets of labeled examples ($n o \infty$). The next phase is likely to focus on algorithms capable of learning from very few labeled examples ($n o 1$), like humans seem able to do. We propose an approach to this problem and describe the underlying theory, based on the unsupervised, automatic learning of a ``good'' representation for supervised learning, characterized by small sample complexity ($n$). We consider the case of visual object recognition though the theory applies to other domains. The starting point is the conjecture, proved in specific cases, that image representations which are invariant to translations, scaling and other transformations can considerably reduce the sample complexity of learning. We prove that an invariant and unique (discriminative) signature can be computed for each image patch, $I$, in terms of empirical distributions of the dot-products between $I$ and a set of templates stored during unsupervised learning. A module performing filtering and pooling, like the simple and complex cells described by Hubel and Wiesel, can compute such estimates. Hierarchical architectures consisting of this basic Hubel-Wiesel moduli inherit its properties of invariance, stability, and discriminability while capturing the compositional organization of the visual world in terms of wholes and parts. The theory extends existing deep learning convolutional architectures for image and speech recognition. It also suggests that the main computational goal of the ventral stream of visual cortex is to provide a hierarchical representation of new objects/images which is invariant to transformations, stable, and discriminative for recognition---and that this representation may be continuously learned in an unsupervised way during development and visual experience.

研究动机与目标

为解决人类在极少数标注样本（n→1）下学习的挑战，通过降低视觉识别中的样本复杂度。
发展一种无监督、自动学习不变表征的理论，该表征具有稳定性与判别性。
形式化分层Hubel-Wiesel模块架构如何实现对局部仿射变换（包括平移、缩放和视角变化）的不变性。
弥合灵长类视觉皮层与深度学习之间的洞见，表明腹侧通路的核心功能是通过持续无监督学习构建此类不变表征。

提出的方法

该方法采用由Hubel-Wiesel（HW）模块组成的分层架构，每个模块包含简单细胞（滤波）和复杂细胞（池化），用于计算不变签名。
每个图像块通过其与在无监督预训练期间存储的已学习模板之间点积的经验分布，生成签名向量。
通过对局部感受野进行池化——使用求和或最大值操作——实现对平移和缩放的不变性，模拟复杂细胞的行为。
通过分层组合，该架构继承了不变性、稳定性和判别性，其中高层整合了低层的不变特征以形成全局表征。
该框架通过将不变性作为结构属性嵌入标准卷积网络，而非作为学习结果，从而扩展了标准卷积网络；模板通过无监督方法从无标签数据中学习。
对于3D旋转或平面内姿态变化等复杂变换，该方法引入了专用池化层，对经历此类变换的对象存储视图进行池化。

实验结果

研究问题

RQ1无监督学习不变表征是否能显著减少准确视觉识别所需的标注样本数量？
RQ2Hubel-Wiesel模块的分层架构在保留判别信息的同时，如何实现对局部仿射变换的不变性？
RQ3在使用学习模板的点积经验分布计算下，不变签名在多大程度上可作为唯一、稳定且具有判别性的表征？
RQ4所提出的架构是否能模拟灵长类动物腹侧视觉通路在生成变换不变、分层表征方面的计算功能？
RQ5通过在存储视图上学习池化，而非仅依赖架构设计，是否能实现对3D旋转等复杂变换的不变性？

主要发现

所提出的分层架构通过结构设计实现了对局部仿射变换（包括平移、缩放和视角变化）的不变性，且无需为不变性提供标注数据。
HW模块计算的签名向量对感受野内的形变具有不变性，这通过在眼距变化等图像失真下签名范数的一致性得到验证。
签名向量在图像形变下具有Lipschitz稳定性，即输入的微小变化仅导致表征的有界变化，从而确保鲁棒性。
该架构保持了判别能力：不同图像（如两张人脸）的签名即使在视觉场中发生平移，依然保持差异，从而实现单一样本下的识别。
在HMAX类实现中的实证结果表明，第2层签名对全局平移保持不变，且在不同人脸之间具有判别性，签名间的欧氏距离反映了图像相似性。
集成3D旋转和平面内姿态变化专用池化的模型在Labeled Faces in the Wild数据集上达到了最先进性能，能够从单一视角稳健识别新面孔，且对深度方向的旋转具有鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。