[논문 리뷰] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning
MCUNetV2는 patch-based inference를 도입하고 수용 영역 재분배와 NAS를 통해 MCU의 피크 메모리를 극적으로 감소시켜 작은 이미지 분류 및 물체 인식에서 더 높은 해상도 입력과 최첨단 정확도를 가능하게 한다.
Tiny deep learning on microcontroller units (MCUs) is challenging due to the limited memory size. We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs: the first several blocks have an order of magnitude larger memory usage than the rest of the network. To alleviate this issue, we propose a generic patch-by-patch inference scheduling, which operates only on a small spatial region of the feature map and significantly cuts down the peak memory. However, naive implementation brings overlapping patches and computation overhead. We further propose network redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead. Manually redistributing the receptive field is difficult. We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2. Patch-based inference effectively reduces the peak memory usage of existing networks by 4-8x. Co-designed with neural networks, MCUNetV2 sets a record ImageNet accuracy on MCU (71.8%), and achieves >90% accuracy on the visual wake words dataset under only 32kB SRAM. MCUNetV2 also unblocks object detection on tiny devices, achieving 16.9% higher mAP on Pascal VOC compared to the state-of-the-art result. Our study largely addressed the memory bottleneck in tinyML and paved the way for various vision applications beyond image classification.
연구 동기 및 목표
- Identify the memory bottlenecks in CNNs deployed on MCUs with extremely limited SRAM.
- Propose a patch-based inference scheme to reduce peak memory without changing model accuracy.
- Automatically co-design architecture and inference scheduling via neural architecture search under MCU constraints.
- Demonstrate gains on ImageNet, Visual Wake Words, Pascal VOC, and other tiny-vision tasks under tight memory budgets.
제안 방법
- Analyze memory usage in efficient CNN backbones and observe imbalanced activation memory distribution.
- Propose patch-by-patch execution of the initial memory-intensive stage to reduce peak memory.
- Introduce receptive field redistribution to shift computation to later network stages and reduce overlapping overhead.
- Jointly optimize backbone architecture and inference scheduling via neural architecture search under hardware constraints.
- Evaluate patch-based inference with and without receptive-field redistribution across multiple datasets and MCU platforms.
실험 결과
연구 질문
- RQ1How does imbalanced memory distribution in CNNs constrain MCU-based inference?
- RQ2Can patch-based inference reduce peak memory without prohibitive recomputation or accuracy loss?
- RQ3Does redistributing the receptive field further cut computation overhead while preserving performance?
- RQ4Can joint neural architecture search optimize both model and inference schedule under MCU constraints to maximize accuracy?
주요 결과
- Patch-based inference reduces peak memory by 4–8× across studied networks.
- Receptive-field redistribution lowers additional computation to about 3–4% with maintained accuracy.
- On ImageNet, MCUNetV2 achieves a record 71.8% Top-1 accuracy on MCUs under 512kB SRAM/2MB Flash.
- On Visual Wake Words, MCUNetV2 reaches >90% accuracy with under 32kB SRAM.
- For object detection on Pascal VOC, MCUNetV2-H7 attains 68.3% VOC mAP, a 16.9% gain over the previous state of the art under similar constraints.
- MCUNetV2 enables higher-resolution input and dense prediction tasks on tiny devices previously impractical due to memory limits.
더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.