[论文解读] N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning
本文提出了一种两阶段强化学习方法(先移除层再缩减层)来通过策略梯度和知识蒸馏将教师网络自动压缩为高准确率、轻量级的学生网络。
While bigger and deeper neural network architectures continue to advance the state-of-the-art for many computer vision tasks, real-world adoption of these networks is impeded by hardware and speed constraints. Conventional model compression methods attempt to address this problem by modifying the architecture manually or using pre-defined heuristics. Since the space of all reduced architectures is very large, modifying the architecture of a deep neural network in this way is a difficult task. In this paper, we tackle this issue by introducing a principled method for learning reduced network architectures in a data-driven way using reinforcement learning. Our approach takes a larger `teacher' network as input and outputs a compressed `student' network derived from the `teacher' network. In the first stage of our method, a recurrent policy network aggressively removes layers from the large `teacher' model. In the second stage, another recurrent policy network carefully reduces the size of each remaining layer. The resulting network is then evaluated to obtain a reward -- a score based on the accuracy and compression of the network. Our approach uses this reward signal with policy gradients to train the policies to find a locally optimal student network. Our experiments show that we can achieve compression rates of more than 10x for models such as ResNet-34 while maintaining similar performance to the input `teacher' network. We also present a valuable transfer learning result which shows that policies which are pre-trained on smaller `teacher' networks can be used to rapidly speed up training on larger `teacher' networks.
研究动机与目标
- 促使自动化、数据驱动的网络压缩以满足硬件约束。
- 开发一个原理性 RL 框架,用于搜索从教师网络派生的紧凑体系结构。
- 提出一种两阶段行动方案(先移除层再缩减层),以高效地探索体系结构空间。
- 将知识蒸馏融入训练压缩后的学生模型。
- 在多个数据集上展示压缩效果,并验证学习策略的可迁移性。
提出的方法
- 将教师到学生的压缩问题表述为一个基于网络体系结构的马尔可夫决策过程。
- 使用两阶段策略网络:一个层移除策略(二元选择保留/移除)和一个层缩减策略(对层参数进行连续衰减)。
- 使用 REINFORCE 策略梯度优化策略,奖励结合压缩和准确率(R = Rc × Ra)。
- Rc 为基于参数数量的非线性压缩奖励;Ra 为学生与教师验证准确率的比值。
- 通过将硬件约束放宽为带有渐进惩罚的奖励条件(Ax ≤ b)来实现对约束的融入。
- 通过教师 logits 进行知识蒸馏来训练学生网络(学生输出与教师 logits 之间的 L2 损失)以引导学习。
实验结果
研究问题
- RQ1强化学习是否能够自动发现当从更大教师网络压缩时仍能保持准确性的紧凑学生体系结构?
- RQ2两阶段行动策略(层移除再层缩减)是否可扩展到现代架构和数据集?
- RQ3在相似架构或更大的教师之间,学到的压缩策略的迁移效果如何?
- RQ4如何将硬件约束有效地整合进奖励以产生实用的模型?
- RQ5从教师进行蒸馏是否能够提升压缩后学生网络的性能?
主要发现
- 在如 ResNet-34 等模型上实现了显著的压缩(如最多 10x),且准确率接近教师。
- 两阶段策略学习通过将宏观(层移除)与微观(层缩减)决策分离来加速搜索。
- 在较小的教师上学习的策略可迁移到更大的教师,加速新设置下的训练。
- 在多个数据集(MNIST、CIFAR-10/100、SVHN、Caltech-256)上优于剪枝和手工设计的知识蒸馏基线。
- 在受到硬件约束的奖励下获得了在尺寸约束下可行的模型,展示了实际适用性。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。