QUICK REVIEW

[논문 리뷰] Efficient Deep Aesthetic Image Classification using Connected Local and Global Features

Xin Jin, Le Wu|arXiv (Cornell University)|2016. 10. 07.

Visual Attention and Saliency Detection참고 문헌 44인용 수 5

한 줄 요약

이 논문은 GoogLeNet에서 유도된 수정된 Inception 모듈을 활용해 국소적 및 전역적 특징을 통합하는 경량 딥 컨volution 신경망인 ILGNet을 제안한다. 이는 효율적이고 정확한 이미지 미적 평가를 가능하게 하며, AVA 벤치마크에서 최신 기술 수준의 성능을 달성하면서도 계산 비용을 크게 감소시킨다. 이는 GoogLeNet의 2/3 수준의 정확도를 유지하면서도 훈련 및 추론 시간을 거의 반으로 줄인다.

ABSTRACT

In this paper we investigate the aesthetic image classification problem, also known as automatically classifying an image into low or high aesthetic quality, which is quite a challenging problem. Considering both the local and global information of images is quite important for image aesthetic quality assessment. Currently, a powerful inception module is proposed which shows very high performance in object classification. We have the observation that the inception module has the ability of considering both the local and global features in nature. Thus, in this paper, we propose a novel DCNN structure codenamed ILGNet for image aesthetics classification, which introduces the Inception module and connects intermediate Local layers to the Global layer for the output. In addition, the ILGNet is derived from part of the GoogLeNet. Thus, we can easily use a pre-trained image classification GoogleLeNet model on the ImageNet dataset and fine tune our connected local and global layer on the large scale aesthetics assessment AVA dataset. The experimental results show that the proposed ILGNet outperforms the state of the art results in image aesthetics assessment in the AVA benchmark. The time cost of both training and test of the ILGNet are significantly less than those of full GoogLeNet with only a little reduction of the classification accuracy. Our ILGNet can achieve similar classification accuracy as that of 2/3 GoogLeNet, whose computational cost is nearly twice of ours. This makes the aesthetic assessment model more easily to be integrated into mobile and embedded systems.

연구 동기 및 목표

효율적이고 정확한 딥 러닝 기반 이미지 미적 품질 평가 문제를 해결하기 위해.
향상된 미적 분류 성능를 위해 국소적 및 전역적 이미지 특징을 모두 활용하기 위해.
전체 GoogLeNet과 비교해 계산 비용을 줄이면서도 높은 정확도를 유지하기 위해.
이미지 분류 모델을 이동 및 임베디드 시스템에 배포할 수 있도록 하기 위해.

제안 방법

Inception 모듈의 다중 척도 특징 추출 능력을 활용해 중간 국소 레이어를 전역 분류 레이어에 연결하는 새로운 DCNN 아키텍처인 ILGNet을 제안한다.
ImageNet에서 사전 훈련된 GoogLeNet 모델을 가져와 대규모 AVA 데이터셋에서 미적 분류 작업을 위해 미세조정한다.
네트워크의 여러 단계에서 유도된 특징을 융합하여 국소 표현과 최종 전역 레이어를 결합함으로써 분류 능력을 향상시킨다.
Inception 모듈의 설계를 활용해 이미지의 국소적 질감과 전역적 구조적 정보를 자연스럽게 포착한다.
이동 학습을 활용해 훈련 속도를 가속화하고 미적 분류 작업에서 일반화 성능을 향상시킨다.
전체 GoogLeNet 대비 파라미터 수와 FLOPs를 줄여 추론 효율성을 최적화한다.

실험 결과

연구 질문

RQ1국소적 및 전역적 특징을 통합하는 경량 CNN 아키텍처가 기존 모델보다 이미지 미적 분류에서 뛰어난 성능을 낼 수 있는가?
RQ2중간 국소 특징을 전역 분류기와 융합할 경우 성능 및 효율성에 어떤 영향을 미치는가?
RQ3미세조정된 GoogLeNet 기반 모델이 계산 비용을 줄이면서도 높은 정확도를 달성할 수 있는 정도는 어느 정도인가?
RQ4그러한 모델이 효율성 덕분에 이동 및 임베디드 시스템에 효과적으로 배포될 수 있는가?

주요 결과

ILGNet은 이미지 미적 분류의 AVA 벤치마크에서 최신 기술 수준의 성능을 달성한다.
전체 GoogLeNet 대비 훈련 및 추론 시간을 크게 단축시켰으며, 정확도 저하 폭은 미미하다.
ILGNet은 GoogLeNet 모델의 2/3 수준의 정확도를 유지하면서도 계산 비용을 거의 반으로 줄였다.
연결된 국소적 및 전역적 특징의 융합은 특징 표현을 향상시켜 더 나은 분류 결과를 이끌어낸다.
낮은 계산 부하 덕분에 이동 및 임베디드 시스템에 배포하기에 매우 적합하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.