QUICK REVIEW

[论文解读] Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery

Ian Goodfellow, Aaron Courville|arXiv (Cornell University)|Jan 16, 2012

Domain Adaptation and Few-Shot Learning参考文献 22被引用 42

一句话总结

本文提出Spike-and-Slab稀疏编码（S3C），一种结合稀疏编码与spike-and-slab RBM的混合生成模型，实现无监督特征发现。其采用结构化变分推断进行可扩展的GPU训练，在CIFAR-10上达到78.3%的准确率，与稀疏编码性能相当，优于ssRBM，同时支持有效的半监督学习，并在迁移学习挑战中胜出。

ABSTRACT

We consider the problem of using a factor model we call {\em spike-and-slab sparse coding} (S3C) to learn features for a classification task. The S3C model resembles both the spike-and-slab RBM and sparse coding. Since exact inference in this model is intractable, we derive a structured variational inference procedure and employ a variational EM training algorithm. Prior work on approximate inference for this model has not prioritized the ability to exploit parallel architectures and scale to enormous problem sizes. We present an inference procedure appropriate for use with GPUs which allows us to dramatically increase both the training set size and the amount of latent factors. We demonstrate that this approach improves upon the supervised learning capabilities of both sparse coding and the ssRBM on the CIFAR-10 dataset. We evaluate our approach's potential for semi-supervised learning on subsets of CIFAR-10. We demonstrate state-of-the art self-taught learning performance on the STL-10 dataset and use our method to win the NIPS 2011 Workshop on Challenges In Learning Hierarchical Models' Transfer Learning Challenge.

研究动机与目标

开发一种可扩展、可微分的特征发现方法，结合稀疏编码与spike-and-slab RBM的优势。
通过二值spike变量与连续slab变量解耦，解决稀疏编码中稀疏性与幅度正则化混淆的问题。
通过使用变分推断而非MAP推断，实现与深度生成模型（如深度Boltzmann机）的集成。
在基准数据集上，于监督学习、半监督学习与自教学习设置中验证方法的有效性。
仅使用无标签数据进行特征学习，即在迁移学习挑战中实现最先进性能。

提出的方法

S3C使用二值spike变量$h_i$与连续slab变量$s_i$建模数据，其中$h_i$控制$s_i$的激活，形成联合隐藏单元。
模型采用spike-and-slab先验：$p(h_i=1) = \sigma(b_i)$ 与 $p(s_i|h_i) = \mathcal{N}(s_i | h_i\mu_i, \alpha_{ii}^{-1})$，实现对稀疏性与激活幅度的独立控制。
可见数据$v_d$通过$p(v_d|s,h) = \mathcal{N}(v_d | W_{d:}(h \circ s), \beta_{dd}^{-1})$生成，其中$W$的列被约束为单位范数，以避免过参数化。
采用结构化变分推断方法，使用均值场近似近似真实后验$p(h,s|v)$，实现高效的GPU加速训练。
变分EM算法交替更新变分参数$\hat{h}, \hat{s}$与优化模型参数$W, \mu, \alpha, \beta, b$。
推断步骤基于后验期望对数的闭式更新$\hat{h}_i$，并引入阻尼以改善收敛性。

实验结果

研究问题

RQ1结合spike-and-slab先验与稀疏编码的混合模型，是否能相比标准稀疏编码或ssRBM，提升无监督特征发现性能？
RQ2通过$b_i$控制稀疏性、$\mu_i, \alpha_i$控制激活幅度，实现两者的解耦，是否能带来更好的泛化能力与更可解释的特征？
RQ3结构化变分推断是否能通过GPU加速，实现S3C在大规模数据集（如CIFAR-10）上的可扩展训练？
RQ4当仅有少量标注数据时，S3C在半监督学习中的表现如何？
RQ5S3C特征能否在真实迁移学习挑战中有效迁移至新任务？

主要发现

S3C在CIFAR-10上使用3×3池化网格时达到78.3%的测试准确率，与使用自然编码的稀疏编码性能相当，优于ssRBM。
在2×2池化网格下，S3C达到76.2%的准确率，表明即使特征更少也具有鲁棒性。
S3C优于ssRBM，后者需4,096个基向量与3×3网格才达到76.7%的准确率。
在半监督学习中，S3C在中等规模标注子集上表现出更好的泛化能力，表明其具有灵活的正则化能力。
S3C在NIPS 2011迁移学习挑战中以48.6%的测试准确率胜出，仅使用无标签数据进行特征学习，并用少量标注数据进行微调。
结构化变分推断方法实现了大规模数据集与高维潜在空间的可扩展训练，使S3C适用于深度生成模型。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。