QUICK REVIEW

[論文レビュー] BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

Changqian Yu, Changxin Gao|arXiv (Cornell University)|Apr 5, 2020

Advanced Neural Network Applications参考文献 66被引用数 115

ひとこと要約

BiSeNet V2 は、Detail Branch（空間的な詳細のため）と Semantic Branch（意味論のため）の 2 路線アーキテクチャを、Bilateral Guided Aggregation Layer および booster training と組み合わせて、高精度のリアルタイムセマンティックセグメンテーションを実現します。例えば Cityscapes のテストで 156 FPS、72.6% mIoU。

ABSTRACT

The low-level details and high-level semantics are both essential to the semantic segmentation task. However, to speed up the model inference, current approaches almost always sacrifice the low-level details, which leads to a considerable accuracy decrease. We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for realtime semantic segmentation. To this end, we propose an efficient and effective architecture with a good trade-off between speed and accuracy, termed Bilateral Segmentation Network (BiSeNet V2). This architecture involves: (i) a Detail Branch, with wide channels and shallow layers to capture low-level details and generate high-resolution feature representation; (ii) a Semantic Branch, with narrow channels and deep layers to obtain high-level semantic context. The Semantic Branch is lightweight due to reducing the channel capacity and a fast-downsampling strategy. Furthermore, we design a Guided Aggregation Layer to enhance mutual connections and fuse both types of feature representation. Besides, a booster training strategy is designed to improve the segmentation performance without any extra inference cost. Extensive quantitative and qualitative evaluations demonstrate that the proposed architecture performs favourably against a few state-of-the-art real-time semantic segmentation approaches. Specifically, for a 2,048x1,024 input, we achieve 72.6% Mean IoU on the Cityscapes test set with a speed of 156 FPS on one NVIDIA GeForce GTX 1080 Ti card, which is significantly faster than existing methods, yet we achieve better segmentation accuracy.

研究の動機と目的

リアルタイムセマンティックセグメンテーションを、低レベルの空間的ディテールを損なわずに推進する。
空間的ディテールと意味論的文脈を分離する 2 路線アーキテクチャを提案する。
両路線を組み合わせる効率的な融合機構を設計する。
推論コストを増やさずに精度を向上させる Booster training 戦略を導入する。
Cityscapes、CamVid、COCO-Stuff データセットでの有効性を示す。

提案手法

Detail Branch は高解像度の空間的ディテールを捉えるために広いチャンネルと浅い層を使用する。
Semantic Branch は狭いチャンネルと深い層を用いて高レベルの意味論を捉えるため、軽量な畳み込みと高速ダウンサンプリングを採用する。
Semantic Branch に受容野を拡大する Context Embedding Block。
軽量で表現力のある意味論的パスを構築する Gather-and-Expansion (GE) Layer。
Detail と Semantic Branch の出力を意味論的文脈に導かれてガイド付きに融合する Bilateral Guided Aggregation Layer。
Booster training は、訓練時に精度を向上させる補助予測ヘッドを用い、推論時には削除される。

実験結果

リサーチクエスチョン

RQ1BiSeNet V2 はリアルタイム推論速度を維持しつつ高いセグメンテーション精度を達成できるか。
RQ2空間的ディテールと意味論的文脈を分離することで、同程度の計算予算の単一路線アーキテクチャより性能が向上するか。
RQ3Bilaterial Guided Aggregation Layer がマルチスケールのディテールと意味を融合する際の有効性はどれほどか。
RQ4推論コストを増やさずに booster training が最終性能に与える影響はどの程度か。

主な発見

Cityscapes テストで 72.6% mean IoU、GTX 1080 Ti 上で 156 FPS を達成。
Detail Branch と Semantic Branch は補完的な情報を提供し、Bilateral Guided Aggregation Layer での融合が単純な加算や連結より優れた結果を生む。
Semantic Branch は depthwise 畳み込みと高速ダウンサンプリングにより軽量ながら効果的で、Detail Branch は空間的ディテールを保持する。
Booster training は推論コストを増やすことなく精度を向上させる。
Cityscapes、CamVid、COCO-Stuff データセットでの有効性を検証。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。