QUICK REVIEW

[Paper Review] ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation

Xiaoliang Dai, Peizhao Zhang|arXiv (Cornell University)|Dec 21, 2018

Advanced Neural Network Applications31 references17 citations

TL;DR

ChamNet proposes a platform-aware neural architecture adaptation framework that optimizes existing efficient building blocks for target latency and energy constraints using fast, accurate predictors. By leveraging Gaussian Process-based Bayesian optimization and hardware-specific latency lookups, it achieves state-of-the-art accuracy—73.8% top-1 on ImageNet at 20ms on mobile CPU and DSP—while reducing search time to minutes instead of GPU weeks.

ABSTRACT

This paper proposes an efficient neural network (NN) architecture design methodology called Chameleon that honors given resource constraints. Instead of developing new building blocks or using computationally-intensive reinforcement learning algorithms, our approach leverages existing efficient network building blocks and focuses on exploiting hardware traits and adapting computation resources to fit target latency and/or energy constraints. We formulate platform-aware NN architecture search in an optimization framework and propose a novel algorithm to search for optimal architectures aided by efficient accuracy and resource (latency and/or energy) predictors. At the core of our algorithm lies an accuracy predictor built atop Gaussian Process with Bayesian optimization for iterative sampling. With a one-time building cost for the predictors, our algorithm produces state-of-the-art model architectures on different platforms under given constraints in just minutes. Our results show that adapting computation resources to building blocks is critical to model performance. Without the addition of any bells and whistles, our models achieve significant accuracy improvements against state-of-the-art hand-crafted and automatically designed architectures. We achieve 73.8% and 75.3% top-1 accuracy on ImageNet at 20ms latency on a mobile CPU and DSP. At reduced latency, our models achieve up to 8.5% (4.8%) and 6.6% (9.3%) absolute top-1 accuracy improvements compared to MobileNetV2 and MnasNet, respectively, on a mobile CPU (DSP), and 2.7% (4.6%) and 5.6% (2.6%) accuracy gains over ResNet-101 and ResNet-152, respectively, on an Nvidia GPU (Intel CPU).

Motivation & Objective

To address the challenge of deploying efficient neural networks across diverse hardware platforms with varying resource constraints.
To reduce the time and computational cost of neural architecture search by replacing expensive training and measurement with predictive modeling.
To improve model accuracy without increasing latency or energy consumption through intelligent computation resource distribution.
To enable scalable, large-scale deployment of compact models across heterogeneous devices by minimizing per-platform search overhead.

Proposed method

Formulates platform-aware neural architecture search as an optimization problem using accuracy and resource (latency/energy) predictors.
Employs a Gaussian Process-based Bayesian optimization framework to iteratively sample high-accuracy architectures with minimal evaluation cost.
Uses an operator latency look-up table (LUT) for fast, accurate latency estimation tailored to specific hardware platforms.
Introduces imbalanced quasi-Monte Carlo sampling to improve the efficiency and robustness of the accuracy and resource predictors.
Adapts computation distribution across network stages by redistributing FLOPs based on hardware characteristics and feature map size.
Builds one-time predictors (accuracy, latency, energy) that amortize search cost across multiple platforms and constraints, reducing total cost from O(m·n·k) to O(m+n).

Experimental results

Research questions

RQ1Can we achieve state-of-the-art accuracy on resource-constrained platforms without using reinforcement learning or novel building blocks?
RQ2How does platform-aware computation redistribution impact model accuracy and efficiency across different hardware?
RQ3Can predictive models significantly reduce the time and cost of neural architecture search while maintaining high accuracy?
RQ4What is the impact of FLOP distribution across network stages on inference speed and accuracy on mobile CPUs and DSPs?
RQ5How does the proposed method compare to existing NAS and compression techniques in terms of accuracy, latency, and search efficiency?

Key findings

ChamNet achieves 73.8% top-1 accuracy on ImageNet at 20ms latency on a mobile CPU, outperforming MobileNetV2 and MnasNet by 8.5% and 6.6% respectively in absolute accuracy.
On a mobile DSP, ChamNet achieves 75.3% top-1 accuracy at 20ms latency, showing 9.3% and 4.8% absolute accuracy gains over ResNet-152 and MnasNet respectively.
The framework reduces search time to minutes by using one-time-trained predictors, avoiding the hundreds of GPU hours required by MnasNet.
ChamNet improves CPU utilization by redistributing FLOPs from early to later stages, enabling 2.1% higher accuracy and 5% lower latency than MobileNetV2 at similar latency.
The method achieves 1.7% higher accuracy and 1.75× speedup over MnasNet on a Samsung Galaxy S8 with Snapdragon 835 CPU under 20ms latency.
The framework reduces total search cost from O(m·n·k) to O(m+n), making it highly scalable for large-scale heterogeneous deployment.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.