QUICK REVIEW

[논문 리뷰] MobileDets: Searching for Object Detection Architectures for Mobile Accelerators

Yunyang Xiong, Hanxiao Liu|arXiv (Cornell University)|2020. 04. 30.

Advanced Neural Network Applications참고 문헌 44인용 수 41

한 줄 요약

MobileDets는 정규 합성곱과 역구성 블록을 포함하는 플랫폼 인식 NAS 검색 공간을 도입하여 CPU, EdgeTPU, DSP, 엣지 GPU에서 모바일 물체 탐지의 대기 시간-정확도 무역 오차에서 최첨단 성능을 달성합니다.

ABSTRACT

Inverted bottleneck layers, which are built upon depthwise convolutions, have been the predominant building blocks in state-of-the-art object detection models on mobile devices. In this work, we investigate the optimality of this design pattern over a broad range of mobile accelerators by revisiting the usefulness of regular convolutions. We discover that regular convolutions are a potent component to boost the latency-accuracy trade-off for object detection on accelerators, provided that they are placed strategically in the network via neural architecture search. By incorporating regular convolutions in the search space and directly optimizing the network architectures for object detection, we obtain a family of object detection models, MobileDets, that achieve state-of-the-art results across mobile accelerators. On the COCO object detection task, MobileDets outperform MobileNetV3+SSDLite by 1.7 mAP at comparable mobile CPU inference latencies. MobileDets also outperform MobileNetV2+SSDLite by 1.9 mAP on mobile CPUs, 3.7 mAP on Google EdgeTPU, 3.4 mAP on Qualcomm Hexagon DSP and 2.7 mAP on Nvidia Jetson GPU without increasing latency. Moreover, MobileDets are comparable with the state-of-the-art MnasFPN on mobile CPUs even without using the feature pyramid, and achieve better mAP scores on both EdgeTPUs and DSPs with up to 2x speedup. Code and models are available in the TensorFlow Object Detection API: https://github.com/tensorflow/models/tree/master/research/object_detection.

연구 동기 및 목표

모던 가속기에서 역구성 블록(IBN)을 넘어 모바일 탐지기용 빌딩 블록을 재평가해야 할 필요성을 제시한다.
정규 합성곱과 Tucker 기반 블록을 포함하는 보강된 검색 공간(MobileDet)을 제안하여 대기 시간-정확도를 개선한다.
물체 탐지 작업에서의 아키텍처 검색이 모바일 하드웨어를 위한 백본 전용 NAS보다 더 나은 결과를 낳는지 입증한다.
MobileDets가 여러 하드웨어 플랫폼에서 낮은 대기 시간으로 최첨단 또는 경쟁력 있는 mAP를 달성한다.
wider adoption을 위한 TensorFlow Object Detection API 내의 릴리스 가능한 코드와 모델을 제공한다.

제안 방법

IBN을 정규 합성곱(Fused Inverted Bottleneck 및 Tucker/convolution_blocks)으로 보강하는 MobileDet 검색 공간을 도입한다.
유연한 두 빌딩 블록: (i) 융합 역구성 블록(Fused inverted bottleneck; 깊이별 컨볼루션+포인트와이즈를 대체하는 일반 KxK 컨브)와 (ii) Tucker 합성(1x1, KxK, 1x1 블록을 통한 압축)을 설명한다.
플랫폼 인식 보상(mAP와 대기 시간)을 결합한 지연 기반 NAS 프레임워크(TuNAS)에 이러한 블록을 포함한다.
레이어 결정으로부터 하드웨어 대기 시간을 예측하는 비용 모델 c(·)을 학습시켜 각 후보에 대해 현장에서 벤치마크를 수행하지 않고도 빠른 NAS를 가능하게 한다.
탐색은 검출 특정 목표(SSDLite 헤드)로 COCO에서 수행하고, 대상 하드웨어 각각에서 처음부터 재학습하여 최종 아키텍처를 평가한다.
TF-Lite, EdgeTPU, DSP, GPU 백엔드에서의 레이턴시 벤치마크를 보고한다.

실험 결과

연구 질문

RQ1NAS를 통해 전략적으로 배치된 정규 합성곱이 다양한 하드웨어에서 모바일 물체 탐지의 대기 시간-정확도 무역 오차를 개선할 수 있는가?
RQ2IBN 계층을 넘어 융합 정규 합성곱 및 Tucker 블록을 포함하는 검색 공간의 확장이 CPU, EdgeTPU, DSP, GPU에서 측정 가능한 이득을 제공하는가?
RQ3한 하드웨어 플랫폼에서 발견된 아키텍처가 다른 플랫폼으로 이전 가능하며 그 한계는 어느 정도인가?
RQ4객체 탐지에 특화된 NAS가 COCO에서 다수의 엣지 장치에서 백본 전용 NAS보다 성능이 우수한가?
RQ5제안된 MobileDet 공간이 unseen 하드웨어(예: NVIDIA Jetson GPU)에서 일반화되어 이득을 유지할 수 있는가?

주요 결과

MobileDets는 IBN-전용 검색 공간에 의존한 베이스라인과 비교하여 CPU, EdgeTPU, DSP, 엣지 GPU에서의 대기 시간-정확도 무역 오차를 지속적으로 개선한다.
COCO에서 MobileDets는 CPU 대기 시간이 비슷한 수준에서 MobileNetV2+SSDLite보다 1.7 mAP를 초과하고 모바일 CPU에서 MobileNetV2보다 1.9 mAP, EdgeTPU에서 3.7 mAP, DSP에서 3.4 mAP, 엣지 GPU에서 2.7 mAP를 초과하되 대기 시간을 증가시키지 않는다.
MobileDets는 모바일 CPU에서 MnasFPN의 성능과 대등하거나 이를 능가하고, EdgeTPU 및 DSP에서 더 나은 mAP를 달성하며 NAS-FPN 헤드를 사용하지 않아도 최대 2x의 속도 향상을 달성한다.
검색 공간에 정규 합성곱을 포함하면 깊이wise 합성곱이 최적화되지 않는 비-CPU 가속기(EgdeTPU, DSP)에서 뚜렷한 이득이 나타난다.
EdgeTPU/DSP에서 발견된 아키텍처가 보이지 않는 하드웨어(NVIDIA Jetson Xavier GPU 등)로도 비교적 잘 이전되어 MobileDet 공간의 일반성을 시사한다.
Tucker 압축 및 융합 블록(IBN+Fused+Tucker)을 포함한 검색 공간은 비-CPU 하드웨어에서 IBN 전용 또는 더 작은 공간보다 추가 성능 향상을 제공한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.