[論文レビュー] ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural Networks
ReLeQ は proximal policy optimization を用いて、DNN の各レイヤーに対して異質な sub-8-bit 量子化を自動的に割り当て、フル精度モデルから始めて元の精度にほぼ近い状態を達成しつつ、 substantial hardware speedups と energy reductions を実現します。
Deep Neural Networks (DNNs) typically require massive amount of computation resource in inference tasks for computer vision applications. Quantization can significantly reduce DNN computation and storage by decreasing the bitwidth of network encodings. Recent research affirms that carefully selecting the quantization levels for each layer can preserve the accuracy while pushing the bitwidth below eight bits. However, without arduous manual effort, this deep quantization can lead to significant accuracy loss, leaving it in a position of questionable utility. As such, deep quantization opens a large hyper-parameter space (bitwidth of the layers), the exploration of which is a major challenge. We propose a systematic approach to tackle this problem, by automating the process of discovering the quantization levels through an end-to-end deep reinforcement learning framework (ReLeQ). We adapt policy optimization methods to the problem of quantization, and focus on finding the best design decisions in choosing the state and action spaces, network architecture and training framework, as well as the tuning of various hyperparamters. We show how ReLeQ can balance speed and quality, and provide an asymmetric general solution for quantization of a large variety of deep networks (AlexNet, CIFAR-10, LeNet, MobileNet-V1, ResNet-20, SVHN, and VGG-11) that virtually preserves the accuracy (=< 0.3% loss) while minimizing the computation and storage cost. With these DNNs, ReLeQ enables conventional hardware to achieve 2.2x speedup over 8-bit execution. Similarly, a custom DNN accelerator achieves 2.0x speedup and energy reduction compared to 8-bit runs. These encouraging results mark ReLeQ as the initial step towards automating the deep quantization of neural networks.
研究の動機と目的
- Automate discovery of per-layer quantization levels (below 8 bits) to preserve accuracy.
- Explore how heterogenous (layer-wise) bitwidths affect overall network performance.
- Demonstrate end-to-end RL framework that balances accuracy with compute and memory cost.
- Provide a practical, hardware-agnostic method that works with conventional CPUs and custom accelerators.
提案手法
- Formulate layer-wise quantization level selection as a multi-objective RL problem with accuracy prioritized over quantization cost.
- Use an LSTM-based policy and a value network within Proximal Policy Optimization (PPO) to learn bitwidth selection sequentially across layers.
- Define state embeddings that include static layer features and dynamic metrics such as current bitwidths and relative accuracy.
- Employ an asymmetric reward shaping that heavily penalizes accuracy loss while encouraging bitrate reduction (State Quantization).
- Train the agent starting from a full-precision model and perform short or long retraining depending on network depth to evaluate quantized performance.
- Quantize weights using WRPN-style mid-tread quantization and evaluate per-layer bitwidths in a per-layer granularity setting.]
- research_questions/1
- Can an RL agent autonomously discover heterogeneous per-layer bitwidths that preserve accuracy while reducing compute and storage?
- research_questions/2
- How do layer-level bitwidth decisions interact across layers, and can an RL framework capture this interplay?
- research_questions/3
- What reward design best guides the agent to converge to Pareto-optimal quantization patterns?
- research_questions/4
- What hardware and software performance gains (speedup, energy reduction) are achievable with the discovered quantization patterns?
実験結果
リサーチクエスチョン
- RQ1Can an RL agent autonomously discover heterogeneous per-layer bitwidths that preserve accuracy while reducing compute and storage?
- RQ2How do layer-level bitwidth decisions interact across layers, and can an RL framework capture this interplay?
- RQ3What reward design best guides the agent to converge to Pareto-optimal quantization patterns?
- RQ4What hardware and software performance gains (speedup, energy reduction) are achievable with the discovered quantization patterns?
主な発見
| Network | データセット | 量子化ビット幅 | 平均ビット幅 | 精度損失 (%) |
|---|---|---|---|---|
| AlexNet | ImageNet | {8,4,4,4,4,4,4,8} | 5 | 0.08 |
| SimpleNet | CIFAR10 | {5,5,5,5,5} | 5 | 0.30 |
| LeNet | MNIST | {2,2,3,2} | 2.25 | 0.00 |
| MobileNet | ImageNet | {8,5,6,6,4,4,7,8,4,6,8,5,5,8,6,7,7,7,6,8,6,8,8,6,7,5,5,7,8,8} | 6.43 | 0.26 |
| ResNet-20 | CIFAR10 | {8,2,2,3,2,2,2,3,2,3,3,3,2,2,2,3,2,2,2,2,2,8} | 2.81 | 0.12 |
| SVHN-10 | SVHN | {8,4,4,4,4,4,4,4,4,8} | 4.80 | 0.00 |
| VGG-11 | CIFAR10 | {8,5,8,5,6,6,6,6,8} | 6.44 | 0.17 |
| VGG-16 | CIFAR10 | {8,8,8,6,8,6,8,6,8,6,8,6,8,6,8,8} | 7.25 | 0.10 |
- ReLeQ achieves sub-8-bit per-layer quantization with accuracy loss ≤ 0.3% across a range of networks (e.g., AlexNet, LeNet, MobileNet, ResNet-20, VGG-11).
- Average bitwidths produced by ReLeQ vary by network (e.g., MobileNet 6.43 bits; ResNet-20 2.81 bits), showing heterogeneity across layers.
- On conventional hardware (TVM on CPU), ReLeQ yields ~2.2× speedup over 8-bit runs; on a custom Stripes accelerator, ~2.0× speedup and ~2.0× energy reduction over 8-bit baselines.
- Compared to ADMM-based quantization, ReLeQ provides higher performance and energy benefits across tested scenarios.
- Reward formulation strongly impacts convergence speed and final accuracy, with the proposed asymmetric reward achieving faster convergence and better relative accuracy.
- Pareto frontier analyses for several networks show ReLeQ finds solutions on the desirable region of the frontier, close to recoverable accuracy.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。