QUICK REVIEW

[논문 리뷰] CondConv: Conditionally Parameterized Convolutions for Efficient Inference

Brandon Yang, Gabriel Bender|arXiv (Cornell University)|2019. 04. 09.

Advanced Neural Network Applications참고 문헌 45인용 수 283

한 줄 요약

본 논문은 Conditionally Parameterized Convolutions (CondConv)를 제안하며, 각 예시에 대해 전문가 커널의 가중 합으로 컨볼루션 커널을 생성해 용량을 증가시키되 추론 비용은 비례적으로 증가하지 않도록 하고, ImageNet과 COCO에서 다양한 아키텍처에 걸쳐 정확도를 향상시킨다.

ABSTRACT

Convolutional layers are one of the basic building blocks of modern deep neural networks. One fundamental assumption is that convolutional kernels should be shared for all examples in a dataset. We propose conditionally parameterized convolutions (CondConv), which learn specialized convolutional kernels for each example. Replacing normal convolutions with CondConv enables us to increase the size and capacity of a network, while maintaining efficient inference. We demonstrate that scaling networks with CondConv improves the performance and inference cost trade-off of several existing convolutional neural network architectures on both classification and detection tasks. On ImageNet classification, our CondConv approach applied to EfficientNet-B0 achieves state-of-the-art performance of 78.3% accuracy with only 413M multiply-adds. Code and checkpoints for the CondConv Tensorflow layer and CondConv-EfficientNet models are available at: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/condconv.

연구 동기 및 목표

"Challenge the assumption of shared convolutional kernels across all examples."
"Increase model capacity and performance without a large rise in inference cost."
"Demonstrate CondConv as a drop-in replacement across CNN architectures."
"Show improved ImageNet classification and COCO detection with CondConv-enabled models."

제안 방법

Parameterize convolutional kernels as a linear combination of n expert kernels: Output(x) = σ((α1W1 + ... + αnWn) * x).
Compute per-example routing weights αi = ri(x) via global average pooling, a fully connected layer, and a sigmoid activation.
Share routing weights across layers within a block to regularize and stabilize training.
Train using either the per-example kernel computation or the equivalent linear mixture of experts formulation for efficiency.
Apply CondConv to multiple architectures (MobileNetV1/V2, ResNet-50, MnasNet, EfficientNet) and evaluate on ImageNet and COCO.
Regularize with dropout, AutoAugment, Mixup, and Shake-Shake-inspired expert dropout as needed.

실험 결과

연구 질문

RQ1Does CondConv improve accuracy with only modest increases in inference cost across diverse CNN architectures?
RQ2How does increasing the number of experts per CondConv layer affect performance and efficiency?
RQ3Where in the network should CondConv be applied to maximize accuracy-cost trade-offs?
RQ4What is the nature of the learned routing weights and their interpretability across classes?
RQ5How does CondConv fare on both image classification (ImageNet) and object detection (COCO) tasks?

주요 결과

CondConv consistently improves top-1 accuracy across MobileNetV1, MobileNetV2, MnasNet-A1, ResNet-50, and EfficientNet-B0 with less than 10% inference cost increase.
On ImageNet, CondConv with 8 experts yields 78.3% top-1 with 413M multiply-adds for EfficientNet-B0, and 79.5% with CondConv-EfficientNet-B0-depth at 614M MADDs.
CondConv-augmented models achieve higher COCO minival mAP than baselines at comparable or lower MADDs (e.g., CondConv-MobileNetV1(0.75x) and SSD300 show higher mAP with similar cost).
Routing weights become more class-specific at deeper layers, and final-layer experts show a bi-modal distribution, indicating specialization of experts.
Applying CondConv across all layers yields better performance; placing CondConv in very early layers provides diminishing returns.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.