QUICK REVIEW

[논문 리뷰] B-CNN: Branch Convolutional Neural Network for Hierarchical Classification

Xinqi Zhu, Michael Bain|arXiv (Cornell University)|2017. 09. 28.

Advanced Neural Network Applications참고 문헌 29인용 수 108

한 줄 요약

B-CNN은 CNN에 가지(branch) 출력을 추가하여 거친-to-정교한 계층에서 예측하도록 하고, BT-전략(BT-strategy)으로 학습합니다; MNIST, CIFAR-10, CIFAR-100에서 기본 CNN 대비 성능이 향상됩니다.

ABSTRACT

Convolutional Neural Network (CNN) image classifiers are traditionally designed to have sequential convolutional layers with a single output layer. This is based on the assumption that all target classes should be treated equally and exclusively. However, some classes can be more difficult to distinguish than others, and classes may be organized in a hierarchy of categories. At the same time, a CNN is designed to learn internal representations that abstract from the input data based on its hierarchical layered structure. So it is natural to ask if an inverse of this idea can be applied to learn a model that can predict over a classification hierarchy using multiple output layers in decreasing order of class abstraction. In this paper, we introduce a variant of the traditional CNN model named the Branch Convolutional Neural Network (B-CNN). A B-CNN model outputs multiple predictions ordered from coarse to fine along the concatenated convolutional layers corresponding to the hierarchical structure of the target classes, which can be regarded as a form of prior knowledge on the output. To learn with B-CNNs a novel training strategy, named the Branch Training strategy (BT-strategy), is introduced which balances the strictness of the prior with the freedom to adjust parameters on the output layers to minimize the loss. In this way we show that CNN based models can be forced to learn successively coarse to fine concepts in the internal layers at the output stage, and that hierarchical prior knowledge can be adopted to boost CNN models' classification performance. Our models are evaluated to show that the B-CNN extensions improve over the corresponding baseline CNN on the benchmark datasets MNIST, CIFAR-10 and CIFAR-100.

연구 동기 및 목표

CNN에서 클래스 계층 구조를 활용하여 해석 가능한 거친-to-정교한 예측을 가능하게 하는 목표를 제시한다.
거친 계층에서 정교한 계층으로 다중 예측을 출력하는 B-CNN 아키텍처를 도입한다.
엔드-투-엔드 학습과의 프리마를 균형 있게 조정하는 BT-strategy를 제안한다.
MNIST, CIFAR-10, CIFAR-100에서 전통적인 CNN 베이스라인 대비 실험적 이점을 보여준다.

제안 방법

여러 깊이에 걸쳐 가지 네트워크를 도입하여 계층적 라벨 트리의 각 수준에 대응하는 예측을 생성한다.
손실을 모든 계층 수준의 교차 엔트로피 손실의 가중 합으로 정의한다(Equation 1).
손실 가중치 A_k(합이 1)가 각 수준의 총 손실 기여를 제어한다(섹션 3.3).
훈련 중에 거친 수준에서 정교한 수준으로 손실 가중치를 이동시키는 Branch Training 전략(BT-strategy)을 도입하여 기울기 소실 문제를 완화한다(섹션 3.4).
가지(branch)는 CNN 특징 위에 완전 연결 신경망으로 구성될 수 있으며(실험에서 단순화됨).
평가에서는 MNIST, CIFAR-10, CIFAR-100에서 SGD 및 표준 CNN 구성요소와 함께 B-CNN 변형을 베이스라인과 비교한다(표 1-3).

실험 결과

연구 질문

RQ1계층적 클래스 구조를 CNN에 내재화하여 해석 가능한 거친-to-정교한 예측을 얻을 수 있는가?
RQ2BT-strategy를 포함한 가지 기반 손실이 계층적 작업에서 CNN의 성능을 평면 CNN과 비교해 향상시키는가?
RQ3MNIST, CIFAR-10, CIFAR-100에서 B-CNN이 기존 CNN 베이스라인에 비해 어떤 성능을 보이는가?

주요 결과

B-CNN 모델은 MNIST, CIFAR-10, CIFAR-100에서 일관되게 베이스라인 CNN보다 우수한 성능을 보였다(표 3).
MNIST에서 B-CNN A는 99.40%를 달성했고 베이스라인 A는 99.27%이다.
CIFAR-10에서 B-CNN B는 84.41%를 달성했고 베이스라인 B는 82.35%이다.
CIFAR-100에서 B-CNN B는 57.59%를 달성했고 베이스라인 B는 51.00%이다; B-CNN C는 64.42%를 달성했고 베이스라인 C는 62.92%이다.
BT-strategy는 더 정교한 수준으로 손실 초점을 옮긴 후 학습 속도를 높이고, 기울기 소실 현상을 방지할 수 있다.
사전 학습 매개변수로 초기화하면 BT-strategy의 관찰된 이점이 무작위 초기화에 비해 감소한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.