QUICK REVIEW

[논문 리뷰] AOGNets: Deep AND-OR Grammar Networks for Visual Recognition

Xilai Li, Tianfu Wu|arXiv (Cornell University)|2017. 11. 15.

Advanced Image and Video Retrieval Techniques참고 문헌 35인용 수 8

한 줄 요약

AOGNets는 AND-OR 문법 네트워크를 구조화하여 계층적이고 조합적인 AOG 빌딩 블록을 도입함으로써 시각 인식을 위한 깊이 있는 네트워크 아키텍처를 제안한다. 이는 AND-노드(특징 연결), OR-노드(특징 합산), 종단 노드(특징 분할)를 사용하여 구성되며, 종단에서의 훈련이 가능하게 하여 ResNet을 능가하고, CIFAR-10, CIFAR-100, ImageNet-1K에서 DenseNet과 유사한 성능을 달성한다. 또한 PASCAL VOC에서 Faster R-CNN 객체 검출 성능을 향상시킨다.

ABSTRACT

This paper presents a method of learning deep AND-OR Grammar (AOG) networks for visual recognition, which we term AOGNets. An AOGNet consists of a number of stages each of which is composed of a number of AOG building blocks. An AOG building block is designed based on a principled AND-OR grammar and represented by a hierarchical and compositional AND-OR graph. Each node applies some basic operation (e.g., Conv-BatchNorm-ReLU) to its input. There are three types of nodes: an AND-node explores composition, whose input is computed by concatenating features of its child nodes; an OR-node represents alternative ways of composition in the spirit of exploitation, whose input is the element-wise sum of features of its child nodes; and a Terminal-node takes as input a channel-wise slice of the input feature map of the AOG building block. AOGNets aim to harness the best of two worlds (grammar models and deep neural networks) in representation learning with end-to-end training. In experiments, AOGNets are tested on three highly competitive image classification benchmarks: CIFAR-10, CIFAR-100 and ImageNet-1K. AOGNets obtain better performance than the widely used Residual Net and its variants, and are tightly comparable to the Dense Net. AOGNets are also tested in object detection on the PASCAL VOC 2007 and 2012 using the vanilla Faster RCNN system and obtain better performance than the Residual Net.

연구 동기 및 목표

시각적 표현 향상을 위해 구조화된 문법 모델과 종단에서의 학습을 통합하는 깊이 신경망 아키텍처를 개발하는 것.
ResNet과 같은 고정 아키텍처의 한계를 해결하기 위해 특징 학습에서 조합성과 대체성을 모두 모델링하는 구성적이고 계층적인 구조를 도입하는 것.
표준 모델의 아키텍처 수정 없이 이미지 분류 및 객체 검출 벤치마크에서 경쟁 가능한 성능을 달성하는 것.
문법 기반의 계층적 조합이 깊이 신경망에서 특징 학습을 향상시킬 수 있음을 보여주는 것.

제안 방법

AOGNets는 각각 계층적 AND-OR 그래프로 구성된 AOG 빌딩 블록을 포함하는 다수의 스테이지로 구성된다.
각 AOG 빌딩 블록은 세 가지 노드 유형을 사용한다: AND-노드는 연결을 통한 특징 조합, OR-노드는 요소별 합산을 통한 대체 특징 경로, 종단 노드는 채널 기반 입력 분할.
각 노드는 입력 특징에 표준 딥러닝 연산(예: Conv-BatchNorm-ReLU)을 적용한다.
네트워크는 종단에서의 훈련이 가능하여 계층적 문법 구조와 특징 학습 구성 요소를 함께 최적화할 수 있다.
특징 계층에서 OR-노드를 통한 활용과 AND-노드를 통한 탐색을 가능하게 하여 인간의 시각 인식을 모방한다.

실험 결과

연구 질문

RQ1문법 기반의 계층적 구조가 시각 인식 작업에서 딥러닝 표현을 향상시킬 수 있는가?
RQ2딥 뉴럴 네트워크에 AND-OR 문법을 통합할 경우 표준 벤치마크에서 성능에 어떤 영향을 미치는가?
RQ3조합적 AND-OR 구조는 현대의 잔차 및 밀도 네트워크 성능을 능가하거나 이를 추월하는가?
RQ4AOGNet 아키텍처는 표준 Faster R-CNN 프레임워크 내에서 객체 검출 성능을 향상시킬 수 있는가?

주요 결과

AOGNets는 CIFAR-10, CIFAR-100, ImageNet-1K에서 Residual Networks 및 그 변종보다 더 높은 분류 정확도를 달성한다.
동일한 벤치마크에서 AOGNets의 성능은 최신 기술 수준의 아키텍처인 DenseNet과 매우 유사하게 근접해 있다.
Faster R-CNN 프레임워크에 통합되었을 때, AOGNets는 ResNet 기반 모델보다 PASCAL VOC 2007 및 2012에서 객체 검출 성능을 향상시켰다.
AOGNets의 종단에서의 훈련은 계층적 문법 구조와 특징 학습 구성 요소를 성공적으로 최적화했다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.