QUICK REVIEW

[논문 리뷰] BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

Changqian Yu, Changxin Gao|arXiv (Cornell University)|2020. 04. 05.

Advanced Neural Network Applications참고 문헌 66인용 수 115

한 줄 요약

BiSeNet V2는 두 개의 경로 아키텍처(공간 세부 정보를 위한 Detail Branch와 의미 정보를 위한 Semantic Branch)와 Bilateral Guided Aggregation Layer 및 booster training을 도입하여 고정밀 실시간 의미론 분할을 달성합니다. 예: Cityscapes 테스트에서 72.6% mIoU와 156 FPS.

ABSTRACT

The low-level details and high-level semantics are both essential to the semantic segmentation task. However, to speed up the model inference, current approaches almost always sacrifice the low-level details, which leads to a considerable accuracy decrease. We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for realtime semantic segmentation. To this end, we propose an efficient and effective architecture with a good trade-off between speed and accuracy, termed Bilateral Segmentation Network (BiSeNet V2). This architecture involves: (i) a Detail Branch, with wide channels and shallow layers to capture low-level details and generate high-resolution feature representation; (ii) a Semantic Branch, with narrow channels and deep layers to obtain high-level semantic context. The Semantic Branch is lightweight due to reducing the channel capacity and a fast-downsampling strategy. Furthermore, we design a Guided Aggregation Layer to enhance mutual connections and fuse both types of feature representation. Besides, a booster training strategy is designed to improve the segmentation performance without any extra inference cost. Extensive quantitative and qualitative evaluations demonstrate that the proposed architecture performs favourably against a few state-of-the-art real-time semantic segmentation approaches. Specifically, for a 2,048x1,024 input, we achieve 72.6% Mean IoU on the Cityscapes test set with a speed of 156 FPS on one NVIDIA GeForce GTX 1080 Ti card, which is significantly faster than existing methods, yet we achieve better segmentation accuracy.

연구 동기 및 목표

낮은 수준의 공간 디테일을 희생하지 않으면서 실시간 의미론 분할을 추진한다.
공간 디테일과 의미 맥락을 분리하는 두 경로 아키텍처를 제안한다.
두 경로를 결합하는 효율적인 융합 메커니즘을 설계한다.
추가 추론 비용 없이 정확도를 향상시키는 booster training 전략을 도입한다.
Cityscapes, CamVid, COCO-Stuff 데이터셋에서 효과를 입증한다.

제안 방법

Detail Branch는 넓은 채널과 얕은 층으로 고해상도 공간 디테일을 포착한다.
Semantic Branch는 얇은 채널과 깊은 층으로 경량 합성곱과 빠른 다운샘플링을 사용해 고수준 의미를 포착한다.
Context Embedding Block으로 Semantic Branch의 수용 영역을 확장한다.
Gather-and-Expansion (GE) Layer를 통해 경량이면서도 표현력이 있는 의미 경로를 구축한다.
의미 컨텍스트에 의해 안내되는 Detail와 Semantic Branch 출력을 융합하는 Bilateral Guided Aggregation Layer.
추가 예측 헤드를 활용한 booster training으로 학습 중 정확도를 개선하되 추론 시에는 제외된다.

실험 결과

연구 질문

RQ1BiSeNet V2가 실시간 추론 속도를 유지하면서 높은 분할 정확도를 달성할 수 있는가?
RQ2유사한 계산 예산에서 공간 디테일과 의미 맥락을 분리하는 것이 단일 경로 아키텍처보다 성능을 향상시키는가?
RQ3다중 스케일 디테일과 의미를 융합하는 Bilateral Guided Aggregation Layer의 효과는 어느 정도인가?
RQ4추론 비용 없이 booster training이 최종 성능에 미치는 영향은 무엇인가?

주요 결과

GTX 1080 Ti에서 Cityscapes 테스트에서 156 FPS로 72.6% mean IoU를 달성한다.
Detail Branch와 Semantic Branch는 보완적인 정보를 제공하며, Bilateral Guided Aggregation Layer를 통한 융합은 단순 합산이나 연결보다 우수한 결과를 낳는다.
Semantic Branch는 depthwise 합성곱과 빠른 다운샘플링으로 경량하면서도 효과적일 수 있으며, Detail Branch는 공간 디테일을 보존한다.
Booster training은 추론 비용을 증가시키지 않으면서 정확도를 향상시킨다.
Cityscapes, CamVid, COCO-Stuff 데이터셋에서 효과를 검증했다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.