QUICK REVIEW

[论文解读] ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections

Sujith Ravi|arXiv (Cornell University)|Aug 2, 2017

Advanced Neural Network Applications参考文献 25被引用 43

一句话总结

ProjectionNet 提出了一种联合训练框架，通过神经投影将大型、高精度的深度神经网络蒸馏为紧凑、高效的模型——即通过随机投影将激活值映射到低比特表示。该方法在大幅降低内存和计算需求的同时保持了高精度，表明仅使用 720 个比特即可在 CIFAR-100 上保留全网络超过 90% 的性能。

ABSTRACT

Deep neural networks have become ubiquitous for applications related to visual recognition and language understanding tasks. However, it is often prohibitive to use typical neural networks on devices like mobile phones or smart watches since the model sizes are huge and cannot fit in the limited memory available on such devices. While these devices could make use of machine learning models running on high-performance data centers with CPUs or GPUs, this is not feasible for many applications because data can be privacy sensitive and inference needs to be performed directly "on" device. We introduce a new architecture for training compact neural networks using a joint optimization framework. At its core lies a novel objective that jointly trains using two different types of networks--a full trainer neural network (using existing architectures like Feed-forward NNs or LSTM RNNs) combined with a simpler "projection" network that leverages random projections to transform inputs or intermediate representations into bits. The simpler network encodes lightweight and efficient-to-compute operations in bit space with a low memory footprint. The two networks are trained jointly using backpropagation, where the projection network learns from the full network similar to apprenticeship learning. Once trained, the smaller network can be used directly for inference at low memory and computation cost. We demonstrate the effectiveness of the new approach at significantly shrinking the memory requirements of different types of neural networks while preserving good accuracy on visual recognition and text classification tasks. We also study the question "how many neural bits are required to solve a given task?" using the new framework and show empirical results contrasting model predictive capacity (in bits) versus accuracy on several datasets.

研究动机与目标

解决在内存受限设备（如智能手机和智能手表）上部署大型深度神经网络的挑战。
克服后训练压缩技术导致模型精度下降的局限性。
开发一种联合优化框架，训练轻量级投影网络以模仿完整高性能网络的行为。
探究为保留深度网络预测能力所需的最少神经比特数量。
通过基于投影的模仿学习训练方式，学习紧凑模型，实现高效、低内存的推理。

提出的方法

使用一个完整、高容量的神经网络（如前馈网络或 RNN）作为‘教师’，监督一个更小、轻量级的‘投影’网络。
应用基于局部敏感哈希的随机投影，将输入或隐藏层表示转换为二值向量（神经比特）。
通过反向传播联合训练两个网络，使投影网络学习模仿教师网络的输出。
使用组合损失函数优化投影网络：一个用于预测精度（匹配真实标签），另一个用于与教师网络邻近预测的一致性。
将投影网络表示为离散的、比特级模型，实现极低内存和计算开销的超高效推理。
将该框架扩展至结构化预测任务，使用图结构损失函数，实现教师网络与投影网络的端到端联合学习。

实验结果

研究问题

RQ1在给定任务下，需要多少神经比特才能捕捉全深度神经网络的预测能力？
RQ2通过联合优化训练的轻量级投影网络是否能在使用远低于全网络内存的条件下，达到与全网络相当的精度？
RQ3随机投影在低维比特空间中在多大程度上能保留深度网络激活值的表征能力？
RQ4该框架在不同网络架构和任务（包括视觉与文本分类）上的泛化能力如何？
RQ5该投影框架能否扩展至半监督或图结构化学习场景，并结合结构化损失函数使用？

主要发现

在 MNIST 上，100 比特的 ProjectionNet 达到了全 3 层前馈网络约 80% 的精度，展现出极高的效率。
在 CIFAR-100 上，720 比特的 ProjectionNet 恢复了全网络超过 90% 的预测性能，表明在极低比特下仍具备强大表征能力。
在 CIFAR-100 上，预测性能在 120 至 720 比特之间出现急剧提升，表明存在一个实现有效表征的关键阈值。
联合训练框架实现了端到端优化，在显著减小模型尺寸的同时保持了模型精度。
该方法支持灵活的模型尺寸调节，可适配多种网络架构和学习场景，包括基于图和半监督学习。
该框架实现了低内存与计算成本的设备端推理，适用于对隐私敏感及网络连接受限的环境。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。