QUICK REVIEW

[논문 리뷰] Wavelet Convolutional Neural Networks for Texture Classification

Shin Fujieda, Kohei Takayama|arXiv (Cornell University)|2017. 07. 24.

Image Retrieval and Classification Techniques참고 문헌 27인용 수 100

한 줄 요약

다중 해상도 스펙트럴 분석을 CNN에 통합한 웨이블릿 CNN을 도입하여 파라미터 수를 줄이면서 질감 분류 성능을 향상시킨다.

ABSTRACT

Texture classification is an important and challenging problem in many image processing applications. While convolutional neural networks (CNNs) achieved significant successes for image classification, texture classification remains a difficult problem since textures usually do not contain enough information regarding the shape of object. In image processing, texture classification has been traditionally studied well with spectral analyses which exploit repeated structures in many textures. Since CNNs process images as-is in the spatial domain whereas spectral analyses process images in the frequency domain, these models have different characteristics in terms of performance. We propose a novel CNN architecture, wavelet CNNs, which integrates a spectral analysis into CNNs. Our insight is that the pooling layer and the convolution layer can be viewed as a limited form of a spectral analysis. Based on this insight, we generalize both layers to perform a spectral analysis with wavelet transform. Wavelet CNNs allow us to utilize spectral information which is lost in conventional CNNs but useful in texture classification. The experiments demonstrate that our model achieves better accuracy in texture classification than existing models. We also show that our model has significantly fewer parameters than CNNs, making our model easier to train with less memory.

연구 동기 및 목표

CNN에 스펙트럴 분석을 도입하여 질감 분류를 개선하도록 동기를 부여한다.
풀링과 컨볼루션을 일반화된 필터링 및 다운샘플링으로 재정의한다.
표준 질감 데이터셋에서 정확도와 파라미터 효율성을 입증한다.
이 접근법의 이점을 보여주기 위해 AlexNet, T-CNN 및 스펙트럴 방법과의 비교를 수행한다.

제안 방법

컨볼루션과 풀링을 일반화된 필터링 및 다운샘플링으로 재정의한다.
저주파 및 고주파 성분을 사용하는 Haar 웨이블릿 다중 해상도 분석을 네트워크에 도입한다.
3x3 컨볼루션, 1x1 패딩, 스트라이드 기반 다운샘플링을 갖는 VGG-19 유사 아키텍처를 사용한다.
텍스처 특징을 강화하기 위해 전결합 층 전에 에너지 층을 삽입한다.
성능 비교를 위해 처음부터 학습하고 ImageNet 사전학습으로 학습한다.
Caffe로 구현하고 데이터 증강과 배치 정규화를 사용하여 224x224 입력으로 학습한다.

실험 결과

연구 질문

RQ1CNN 내의 웨이블릿 기반 다중 해상도 분석이 전통적인 CNN과 비교하여 질감 분류 정확도를 향상시킬 수 있는가?
RQ2저주파 성분과 함께 고주파 성분을 도입하면 정보를 보존하고 질감 변화에 대한 강인성을 향상시키는가?
RQ3정확도 및 파라미터 효율성 측면에서 웨이블릿 CNN이 기존의 스펙트럴 및 CNN 기반의 질감 방법과 어떻게 비교되는가?
RQ4해분해 수준이 질감 분류 성능에 미치는 영향은 무엇인가?

주요 결과

표 1 헤더	표 2 헤더	주요 결과 표(원본에서)
데이터세트	AlexNet	T-CNN	1-level	2-level	3-level	4-level	5-level
kth-tips2-b	48.3 ± 1.4	49.6 ± 0.6	57.5 ± 3.0	57.0 ± 2.3	57.8 ± 2.5	60.5 ± 2.1	59.6 ± 2.5
DTD	22.7 ± 1.3	27.8 ± 1.2	29.0 ± 1.4	30.3 ± 0.9	31.6 ± 1.0	32.2 ± 0.8	32.2 ± 0.7

웨이블릿 CNN은 여러 분해 수준에서 처음부터 학습했을 때 질감 데이터셋에서 AlexNet 및 T-CNN보다 성능이 우수하다.
4단계 분해가 정확도와 파라미터 간의 최적 균형을 자주 제공한다.
ImageNet 사전학습 시 웨이블릿 CNN은 kth-tips2-b에서 최고 성능에 근접하거나 최상위를 차지하고, DTD에서도 경쟁력 있는 결과를 보이며 FV-CNN에 비해 파라미터 수가 크게 적다.
웨이블릿 CNN은 경쟁 모델에 비해 학습 가능한 파라미터 수가 현저히 적다(일부 구성에서 예: 90 MB 미만).
4단계 분해가 강한 성능을 제공했고 5단계는 파라미터 증가로 인한 수익 감소로 인해 수익이 감소했다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.