QUICK REVIEW

[논문 리뷰] No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects

Raja Sunkara, Tie Luo|arXiv (Cornell University)|2022. 08. 07.

Advanced Neural Network Applications인용 수 46

한 줄 요약

SPD-Conv는 strided convolution과 pooling을 space-to-depth 다운샘플링 다음에 비스트라이드 컨벌루션으로 대체하여 저해상도 영상과 작은 객체에서의 성능을 향상시키며, YOLOv5 및 ResNet 변형에 적용되었고 오픈 소스 코드가 이용 가능하다.

ABSTRACT

Convolutional neural networks (CNNs) have made resounding success in many computer vision tasks such as image classification and object detection. However, their performance degrades rapidly on tougher tasks where images are of low resolution or objects are small. In this paper, we point out that this roots in a defective yet common design in existing CNN architectures, namely the use of strided convolution and/or pooling layers, which results in a loss of fine-grained information and learning of less effective feature representations. To this end, we propose a new CNN building block called SPD-Conv in place of each strided convolution layer and each pooling layer (thus eliminates them altogether). SPD-Conv is comprised of a space-to-depth (SPD) layer followed by a non-strided convolution (Conv) layer, and can be applied in most if not all CNN architectures. We explain this new design under two most representative computer vision tasks: object detection and image classification. We then create new CNN architectures by applying SPD-Conv to YOLOv5 and ResNet, and empirically show that our approach significantly outperforms state-of-the-art deep learning models, especially on tougher tasks with low-resolution images and small objects. We have open-sourced our code at https://github.com/LabSAINT/SPD-Conv.

연구 동기 및 목표

전통적인 CNN이 스트라이드 다운샘플링과 풀링으로 인해 저해상도 영상 및 작은 객체에서 성능 한계에 도달하는지 식별한다.
아키텍처 전반에 걸쳐 strided 컨볼루션과 풀링을 대체하는 범용 빌딩 블록으로 SPD-Conv를 제안한다.
객체 탐지 및 이미지 분류 작업에서 SPD-Conv의 효과를 입증한다.
SPD-Conv가 인기 프레임워크에 쉽게 통합되며 재현을 위한 오픈 소스 코드를 제공하는지 보여준다.

제안 방법

SPD-Conv 소개: space-to-depth(SPD) 레이어에 이어 비스트라이드 컨볼루션을 적용한다.
SPD는 공간 데이터를 채널 차원으로 재배열하여 정보를 보존하면서 피처 맵을 다운샘플링한다.
SPD 뒤에 비스트라이드 컨볼루션을 적용하여 채널 차원을 축소하고 판별 특징을 학습한다.
기존 아키텍처(예: YOLOv5, ResNet)에서 모든 스트라이드 컨벌루션 및 풀링 레이어를 SPD-Conv로 대체한다.
나노, 스몰, 미디엄, 라지 SPD 강화 모델을 만들기 위한 스케일링 전략(폭(width)과 깊이(depth))을 제공한다.

실험 결과

연구 질문

RQ1SPD-Conv가 전통적 스트라이드 다운샘플링에 비해 다운샘플링 중 판별 정보를 보존하는지?
RQ2SPD-Conv가 객체 탐지 및 이미지 분류와 같은 다운스트림 작업의 성능을 특히 작은 객체나 저해상도 이미지에서 개선하는지?
RQ3SPD-Conv를 기존 아키텍처(예: YOLOv5, ResNet)에 어떻게 통합하고 모델 크기에 따라 확장할 수 있는지?
RQ4SPD-Conv가 일반적인 딥 러닝 프레임워크(PyTorch, TensorFlow) 및 학습 파이프라인에서 쉽게 채택 가능한지?

주요 결과

SPD-Conv는 스트라이드 컨벌루션 및 풀링을 대체하여 학습 가능한 정보를 잃지 않으면서 피처 맵의 다운샘플링을 수행한다.
SPD-Conv를 적용한 YOLOv5-SPD 및 ResNet-SPD는 특히 작은 객체와 저해상도 이미지에서 성능이 향상된다.
COCO val2017에서 nano YOLOv5-SPD-n은 AP_S에서 러너업 대비 최대 13.15 포인트의 증가를 달성한다.
COCO val2017에서 스몰 모델은 SPD-Conv를 통해 다양한 변형에서 주목할 만한 AP 및 AP_S 이득을 보인다(예: YOLOv5-SPD-s 및 m).
COCO test-dev2017에서 SPD-Conv 모델은 nano, small, large 범주 전반에서 선도적인 AP_S를 유지하며, 전이 학습 기반 기준과의 비교에서도 경쟁력 있는 AP를 보인다.
이미지 분류(Tiny ImageNet 및 CIFAR-10)에서 ResNet18-SPD 및 ResNet50-SPD는 베이스라인보다 더 높은 top-1 정확도를 달성한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.