QUICK REVIEW

[논문 리뷰] $ShiftwiseConv:$ Small Convolutional Kernel with Large Kernel Effect

Dachong Li, Li Li|arXiv (Cornell University)|2024. 01. 23.

CCD and CMOS Imaging Sensors인용 수 8

한 줄 요약

논문은 shift 및 희소 그룹 합성곱을 사용하여 작은 커널로 큰 합성곱 커널을 대체하는 shift-wise 연산자를 도입하고, 비용을 낮추면서 ImageNet에서 강력한 성능을 달성한다.

ABSTRACT

Large kernels make standard convolutional neural networks (CNNs) great again over transformer architectures in various vision tasks. Nonetheless, recent studies meticulously designed around increasing kernel size have shown diminishing returns or stagnation in performance. Thus, the hidden factors of large kernel convolution that affect model performance remain unexplored. In this paper, we reveal that the key hidden factors of large kernels can be summarized as two separate components: extracting features at a certain granularity and fusing features by multiple pathways. To this end, we leverage the multi-path long-distance sparse dependency relationship to enhance feature utilization via the proposed Shiftwise (SW) convolution operator with a pure CNN architecture. In a wide range of vision tasks such as classification, segmentation, and detection, SW surpasses state-of-the-art transformers and CNN architectures, including SLaK and UniRepLKNet. More importantly, our experiments demonstrate that $3 imes 3$ convolutions can replace large convolutions in existing large kernel CNNs to achieve comparable effects, which may inspire follow-up works. Code and all the models at https://github.com/lidc54/shift-wiseConv.

연구 동기 및 목표

Hardware-친화적이지 않은 큰 커널을 사용하지 않고도 대형 수용 영역을 달성하도록 CNN을 유도한다.
큰 커널을 shift 기반 집계로 분해하는 shift-wise 연산자를 제안한다.
성능과 효율성을 높이기 위해 대정밀도 희소성(그룹화된 시프트)과 재매개화를 도입한다.
ImageNet-1K에서 매개변수/계산량을 줄이며 큰 커널 기준선과 비슷한 정확도를 달성한다.

제안 방법

큰 MxN 커널을 여러 개의 작은 kxk 커널로 분해하고 각 결과에 시프트를 적용하여 큰 커널 효과를 에뮬레이션한다.
하드웨어 효율성을 유지하면서 길게 의존하는 관계를 만들기 위해 가지치기를 통한 희소한 그룹 합성곱을 도입한다.
유령(ghost)과 재매개화 기법을 결합하여 다분기 절감을 하나의 추론 경로로 융합한다.
shift-wise 연산자를 다양한 커널 모양과 크기에 일반화하기 위해 포커스 길이와 포커스 너비를 정의한다.
추론 시 구조를 보존하면서 재매개화를 가능하게 하기 위해 가지에 걸쳐 공유되는 희소 마스크를 적용한다.

실험 결과

연구 질문

RQ1작은 합성곱 커널과 시프트 연산을 결합하면 큰 커널의 수용 영역을 재현할 수 있는가?
RQ2shift-wise 그룹화에서의 대정밀도 희소성이 하드웨어 효율성을 가진 희소한 길이 의존성을 제공하는가?
RQ3shift-wise 연산자는 매개변수, FLOPs, 정확도 측면에서 ImageNet-1K의 기존 대형 커널 CNN 방법들과 비교했을 때 어떤 차이가 있는가?

주요 결과

shift-wise 모듈은 일부 대형 커널 기준선에 비해 매개변수와 FLOPs가 크게 감소하면서도 대형 커널 효과를 달성한다.
ImageNet-1K에서 shift-wise 변형은 SLaK-류 아키텍처에 비해 계산 및 매개변수 감소를 통해 경쟁력 있는 정확도를 달성한다.
희소한 학습은 단계별로 활성 그룹의 데이터 기반 감소를 이끌어 초기 단계의 희소성을 증가시키고 후단의 매개변수를 줄인다.
추론 최적화 재매개화(ghost/rep)는 다분기 학습 이득을 하나의 효율적인 경로로 합친다.
재매개화된 shift 기반 합성곱을 통한 하드웨어 친화적 구현은 유사한 정확도에서 GPU에서 처리량을 개선한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.