QUICK REVIEW

[论文解读] LassoNet: A Neural Network with Feature Sparsity

Ismael Lemhadri, Feng Ruan|arXiv (Cornell University)|Jul 29, 2019

Statistical Methods and Inference参考文献 60被引用 59

一句话总结

LassoNet 在神经网络中添加一个跳跃（残差）层，并引入分层约束，以实现全局特征选择并产生稀疏特征子集的正则化路径。

ABSTRACT

Much work has been done recently to make neural networks more interpretable, and one obvious approach is to arrange for the network to use only a subset of the available features. In linear models, Lasso (or $\ell_1$-regularized) regression assigns zero weights to the most irrelevant or redundant features, and is widely used in data science. However the Lasso only applies to linear models. Here we introduce LassoNet, a neural network framework with global feature selection. Our approach enforces a hierarchy: specifically a feature can participate in a hidden unit only if its linear representative is active. Unlike other approaches to feature selection for neural nets, our method uses a modified objective function with constraints, and so integrates feature selection with the parameter learning directly. As a result, it delivers an entire regularization path of solutions with a range of feature sparsity. On systematic experiments, LassoNet significantly outperforms state-of-the-art methods for feature selection and regression. The LassoNet method uses projected proximal gradient descent, and generalizes directly to deep networks. It can be implemented by adding just a few lines of code to a standard neural network.

研究动机与目标

在神经网络中动机化特征选择并解决线性 Lasso 在非线性环境中的局限性。
引入一个通过跳跃层机制强制特征稀疏性的神经网络框架。
开发一种带有新颖 Hier-Prox 的近端梯度优化来训练模型。
提供覆盖特征稀疏性的正则化路径并展示计算效率。
在真实数据集上展示相对于现有特征选择方法的经验优越性。

提出的方法

将经验损失与跳跃层权重（theta）的 l1 惩罚结合起来定义目标函数。
通过 ||W^(1)_j||_infty ≤ M |theta_j| 将第一层权重 W^(1) 与跳跃权重联系起来以强制实现分层。
采用两步训练：标准梯度步伐后再进行每个特征的分层近端更新（Hier-Prox）。
实现暖启动策略以从密集解追踪到稀疏解的正则化路径。
证明 Hier-Prox 对每个特征都可分解，复杂度为 O(p log p)，其中 p 为参数数量。
将该框架扩展至无监督设置，通过 Group-Hier-Prox 在输出之间实现共享特征选择。

实验结果

研究问题

RQ1神经网络是否可以在保持预测能力的同时实现全局特征选择？
RQ2分层稀疏性约束是否能在特征子集上产生可控的正则化路径？
RQ3如何高效地将近端梯度方法调整用于在神经网络中实现特征分层？
RQ4与现有特征选择方法相比，LassoNet 在准确性和特征简约性方面的经验提升有多大？
RQ5LassoNet 是否可以扩展到无监督学习和矩阵补全任务？

主要发现

数据集	(n,d)	类别数	全部特征	Fisher	HSIC-Lasso	PFA	LassoNet
小鼠蛋白质数据集	1080, 77	8	0.990	0.944	0.958	0.939	0.958
MNIST 手写数字数据集	10000, 784	10	0.928	0.813	0.870	0.873	0.873
MNIST-Fashion 数据集	10000, 784	10	0.833	0.671	0.785	0.793	0.800
ISOLET 数据集	7797, 617	26	0.953	0.793	0.877	0.863	0.885
COIL-20 数据集	1440, 400	20	0.996	0.986	0.972	0.975	0.991
Activity 数据集	5744, 561	6	0.853	0.769	0.829	0.779	0.849

LassoNet 在多样化真实数据集上通常优于最先进的特征选择方法。
该方法在保持高预测准确性的同时产出可解释的特征子集。
正则化路径提供了特征稀疏性与性能之间的可控权衡。
密集到稀疏的暖启动有助于提升泛化并避免陷入差的极小值。
Hier-Prox 在其近端子问题中达到全局最优，并且规模为 O(p log p)。
对无监督学习和矩阵补全的扩展显示了该框架的通用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。