QUICK REVIEW

[論文レビュー] RedNet: Residual Encoder-Decoder Network for indoor RGB-D Semantic Segmentation

Jindong Jiang, Lunan Zheng|arXiv (Cornell University)|Jun 4, 2018

Remote Sensing and LiDAR Applications参考文献 37被引用数 181

ひとこと要約

RedNet は RGB-D 融合とピラミッド監督を用いた残差型のエンコーダ-デコーダで、ResNet-50 バックボーンを用いたSUN RGB-D で 47.8% の mIoU を達成します。

ABSTRACT

Indoor semantic segmentation has always been a difficult task in computer vision. In this paper, we propose an RGB-D residual encoder-decoder architecture, named RedNet, for indoor RGB-D semantic segmentation. In RedNet, the residual module is applied to both the encoder and decoder as the basic building block, and the skip-connection is used to bypass the spatial feature between the encoder and decoder. In order to incorporate the depth information of the scene, a fusion structure is constructed, which makes inference on RGB image and depth image separately, and fuses their features over several layers. In order to efficiently optimize the network's parameters, we propose a `pyramid supervision' training scheme, which applies supervised learning over different layers in the decoder, to cope with the problem of gradients vanishing. Experiment results show that the proposed RedNet(ResNet-50) achieves a state-of-the-art mIoU accuracy of 47.8% on the SUN RGB-D benchmark dataset.

研究の動機と目的

室内 RGB-D セマンティックセグメンテーションを深いエンコーダ-デコーダアーキテクチャで改善する。
深度情報を dual-branch RGB-D 融合戦略で取り入れる。
デコーダ層全体にわたるピラミッド監督で勾配消失を緩和する。
エンコーダとデコーダの両方に残差ブロックを用いてより深いネットワークを実現する。
SUN RGB-D で RedNet を評価して性能をベンチマークする。

提案手法

ResNet-50 もしくは ResNet-34 を使用した、残差ブロックを備える dual-branch RGB および Depth エンコーダを用意する。
複数の層で要素ごとの和によって Depth 特徴を RGB ブランチへフュージョンする。
デコーダにアップサンプリング残差ユニットを実装して高解像度を回復する。
複数のデコーダ層からのサイド出力と対応する損失を追加してピラミッド監督を適用する。
損失関数としてメディアン頻度バランシングを用いた重み付きクロスエントロピーで学習する（エンコーダは ImageNet 前訓練済み）。
ResNet-50 使用時にはメモリ削減のためエージェント層をオプションで用いる。

実験結果

リサーチクエスチョン

RQ1RGB-D 融合を備えた残差型エンコーダ-デコーダは既存の室内 RGB-D セグメンテーションモデルを上回れるか。
RQ2複数のエンコーダ層での深度フュージョンはセグメンテーション精度を改善するか。
RQ3ピラミッド監督は最適化と最終性能を改善するか。

主な発見

Model	Pixel	Mean	mIoU
RedNet(ResNet-34) without pyramid	80.3	55.5	45.0
RedNet(ResNet-34)	80.8	58.3	46.8
RedNet(ResNet-50) without pyramid	80.5	57.4	46.0
RedNet(ResNet-50)	81.3	60.3	47.8

RedNet-34 は SUN RGB-D 上でピラミッド監督を用いた場合、46.8 の mIoU、81.3 ピクセル精度、60.3 平均精度を達成。
RedNet-50 は SUN RGB-D 上でピラミッド監督を用いた場合、47.8 の mIoU、81.3 ピクセル精度、60.3 平均精度を達成。
ピラミッド監督なしでは RedNet-34 が 45.0 の mIoU を、RedNet-50 が 46.0 の mIoU を達成。
ResNet-50 でピラミッド監督を追加すると、非ピラミッド版より約 1.8 mIoU（47.8 対 46.0）上回る。
総じて、RedNet のバリアントは SUN RGB-D でいくつかの従来の RGB-D セマンティックセグメンテーション手法を上回る。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。