QUICK REVIEW

[論文レビュー] Seeing Glass: Joint Point Cloud and Depth Completion for Transparent Objects

Haoping Xu, Yi Ru Wang|arXiv (Cornell University)|Sep 30, 2021

Advanced Vision and Imaging被引用数 25

ひとこと要約

TranspareNetは点群と深度を同時に補完し、透明物体の深度を補完。深度の歪みを利用して混雑したシーンで完全集成深度マップを生成し、 TODDデータセットを自動注釈付きで導入。

ABSTRACT

The basis of many object manipulation algorithms is RGB-D input. Yet, commodity RGB-D sensors can only provide distorted depth maps for a wide range of transparent objects due light refraction and absorption. To tackle the perception challenges posed by transparent objects, we propose TranspareNet, a joint point cloud and depth completion method, with the ability to complete the depth of transparent objects in cluttered and complex scenes, even with partially filled fluid contents within the vessels. To address the shortcomings of existing transparent object data collection schemes in literature, we also propose an automated dataset creation workflow that consists of robot-controlled image collection and vision-based automatic annotation. Through this automated workflow, we created Toronto Transparent Objects Depth Dataset (TODD), which consists of nearly 15000 RGB-D images. Our experimental evaluation demonstrates that TranspareNet outperforms existing state-of-the-art depth completion methods on multiple datasets, including ClearGrasp, and that it also handles cluttered scenes when trained on TODD. Code and dataset will be released at https://www.pair.toronto.edu/TranspareNet/

研究の動機と目的

ロボティクスにおけるRGB-D深度が信頼できない状況で、透明物体の堅牢な認識を動機付ける。
透明容器の完全な深度を回復するための点群と深度の共同補完パイプラインを提案。
自動ラベリングを備えた大規模な実世界の透明物体Depthデータセット TODD を作成。
透明物体周囲の深度歪みを活用することで下流の深度補完と物体操作の堅牢性が向上することを示す。

提案手法

各透明物体の深度を点群へデプロジェクト（De-project）
Gridding層 + 3D CNN エンコーダ-デコーダ + Gridding逆を用いて完成点群を予測
完成した点群を再度深度領域へ投影し、Depth Completionモジュールと融合。
SPADE正規化を用いたエンコーダ-デコーダ深度補完ネットワークを用い、物体マスクを条件に深度を refined。
RGBと疎深度分布を融合して致密な深度マップを生成；Gridding Lossとlog-L1深度損失で訓練。
ロボット制御による画像収集とAprilTagsを用いたポーズ整列による自動データセット作成・注釈パイプラインを提供。

実験結果

リサーチクエスチョン

RQ1透明物体の位置での深度情報を破棄するのではなく活用して、深度補完を改善できるか？
RQ2共同の点群と深度補完アプローチは、混雑したシーンの透明物体で最先端手法と比較して精度を改善するか？
RQ3TODDは複雑なシーンで透明物体認識を訓練・評価する現実的でスケーラブルなデータセットか？
RQ4既知の物体と新規物体、混雑したマルチ物体設定でTranspareNetはどのように性能を発揮するか？
RQ5PCCとDCの成分の寄与と、それらを統合した場合の寄与の比較はどうなるか？

主な発見

TranspareNetは透明物体のベンチマーク（例: ClearGrasp）で最新の深度補完法を上回る。
共同のPCC+DCパイプラインは、特に混雑したシーンでDC単独より深度補完を改善する。
TODDは ~14.7千の実RGB-D画像と、地表深度、インスタンスセグメンテーション、6DoFポーズを、背景を変えた6つのガラス容器に対して提供。
自動 TODD データ収集/注釈パイプラインは、最小限の人手介入でスケーラブルなラベリングを可能にする。
TODD上で、TranspareNetは既知・新規物体の両方に対して、複数の物体数（1–3）と混雑レベルで強い性能を示す。
透明性による深度歪みを活用して粗い深度推定を生成し、それが洗練化を助けるというアプローチを検証。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。