QUICK REVIEW

[論文レビュー] Pelee: A Real-Time Object Detection System on Mobile Devices

Robert J. Wang, Xiang Li|arXiv (Cornell University)|Apr 18, 2018

Advanced Neural Network Applications参考文献 19被引用数 101

ひとこと要約

この論文は PeleeNet と従来の畳み込みで構築されたリアルタイム SSD ベースの検出システム（Pelee）を提案し、モバイル機器上で高精度とリアルタイム速度を実現、精度と効率の両面でいくつかのモバイル検出器を上回る。

ABSTRACT

An increasing need of running Convolutional Neural Network (CNN) models on mobile devices with limited computing power and memory resource encourages studies on efficient model design. A number of efficient architectures have been proposed in recent years, for example, MobileNet, ShuffleNet, and MobileNetV2. However, all these models are heavily dependent on depthwise separable convolution which lacks efficient implementation in most deep learning frameworks. In this study, we propose an efficient architecture named PeleeNet, which is built with conventional convolution instead. On ImageNet ILSVRC 2012 dataset, our proposed PeleeNet achieves a higher accuracy and over 1.8 times faster speed than MobileNet and MobileNetV2 on NVIDIA TX2. Meanwhile, PeleeNet is only 66% of the model size of MobileNet. We then propose a real-time object detection system by combining PeleeNet with Single Shot MultiBox Detector (SSD) method and optimizing the architecture for fast speed. Our proposed detection system2, named Pelee, achieves 76.4% mAP (mean average precision) on PASCAL VOC2007 and 22.4 mAP on MS COCO dataset at the speed of 23.6 FPS on iPhone 8 and 125 FPS on NVIDIA TX2. The result on COCO outperforms YOLOv2 in consideration of a higher precision, 13.6 times lower computational cost and 11.3 times smaller model size.

研究の動機と目的

限られた計算資源とメモリを持つデバイス上でのリアルタイムCNNを動機づける。
従来の畳み込みを用いた効率的なCNN（PeleeNet）を設計する。深さ方向分離畳み込みを使わず。
PeleeNetを最適化された SSD ベースの検出器（Pelee）と統合し、モバイルハードウェア上で高速な物体検出を実現。
モバイルデバイスでの速度-精度トレードオフを評価し、最先端の検出器と比較する。

提案手法

PeleeNet を導入。DenseNetに触発されたネットワークで、2-way dense layers, stem block, dynamic bottleneck channels, および推論の高速化のための post-activation を特徴とする。
5つの特徴マップスケール（19x19, 10x10, 5x5, 3x3, 1x1）を備え、予測前に残差予測ブロックを用いた SSD ベースの検出器を採用。
予測には1x1畳み込み核を使用して FLOPsとモデルサイズを削減しつつ精度を維持。
ImageNetでPeleeNetをファインチューニング・訓練し、VOC2007およびCOCOデータセットで評価。TX2ではFP16、iPhone 8ではCoreML最適化を行う。
MobileNet、ShuffleNet、YOLOv2、およびSSDの派生モデルと比較する。

実験結果

リサーチクエスチョン

RQ1畳み込みを従来のものだけで構成されたネットワーク（深さ方向分離畳み込みを使わない）でも、モバイルビジョンタスクで小さなモデルサイズで競争力のある精度を達成できるか。
RQ2PeleeNetをSSDベースの検出器と特定のデザイン選択と統合することで、 accuracyを犠牲にすることなくモバイルデバイスでリアルタイム推論を実現できるか。
RQ35つの特徴マップスケール、残差予測ブロック、および1x1カーネルなどのデザイン選択は、組み込みハードウェアでの精度と速度にどのような影響を与えるか。
RQ4VOCおよびCOCOベンチマークにおいて、YOLOv2、SSD-MobileNetと比較して mAP および計算コストの点で Pelee はどのように競合するか。

主な発見

モデル	入力寸法	FLOPs	モデルサイズ（パラメータ）	データ	mAP (%)
Pelee (VOC07)	304x304	1,210 M	5.43 M	07+12	70.9
Pelee (COCO)	304x304	1,210 M	5.43 M	07+12+COCO	76.4

PeleeNet は ImageNet ILSVRC 2012 で top-1 精度 72.6% を、508 MFLOPs および 2.8M パラメータで達成し、同等または低いモデルサイズの MobileNet および ShuffleNet を上回る。
Pelee、PeleeNet を特徴とする SSD ベースの検出器は VOC07 で 76.4% mAP、COCO で 22.4 mAP を達成し、競合検出器よりはるかに小さなモデルサイズと FLOP を有する。
実機では、Pelee は iPhone 8 で 23.6 FPS、NVIDIA TX2 で FP16 で 125 FPS を実現し、SSD+MobileNet 系を速度で上回り、精度でもしばしば勝る。
COCO test-dev2015 では、Pelee は SSD+MobileNet および YOLOv2 より高い mAP を達成し、YOLOv2 より3.7x高速、モデルサイズは11.3x小さい。
YOLOv2 と比較して、Pelee は COCO でより高い精度と著しく低い計算コストおよびモデルサイズを示す。
残差予測ブロックと 1x1 カーネルの設計選択は FLOPs とパラメータ数の削減に寄与しつつ、競争力のある精度を維持する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。