Skip to main content
QUICK REVIEW

[論文レビュー] 2018 Robotic Scene Segmentation Challenge

Max Allan, Satoshi Kondo|arXiv (Cornell University)|Jan 30, 2020
Surgical Simulation and Training参考文献 17被引用数 118
ひとこと要約

この論文は2018年の EndoVis ロボット視野分割チャレンジを紹介し、解剖学的クラスと医療機器クラス、19の豚のエンドスコープ系列、そして4つのテストデータセットに跨る平均IoUで評価されたマルチチームのベンチマークを導入します。

ABSTRACT

In 2015 we began a sub-challenge at the EndoVis workshop at MICCAI in Munich using endoscope images of ex-vivo tissue with automatically generated annotations from robot forward kinematics and instrument CAD models. However, the limited background variation and simple motion rendered the dataset uninformative in learning about which techniques would be suitable for segmentation in real surgery. In 2017, at the same workshop in Quebec we introduced the robotic instrument segmentation dataset with 10 teams participating in the challenge to perform binary, articulating parts and type segmentation of da Vinci instruments. This challenge included realistic instrument motion and more complex porcine tissue as background and was widely addressed with modifications on U-Nets and other popular CNN architectures. In 2018 we added to the complexity by introducing a set of anatomical objects and medical devices to the segmented classes. To avoid over-complicating the challenge, we continued with porcine data which is dramatically simpler than human tissue due to the lack of fatty tissue occluding many organs.

研究の動機と目的

  • Extend semantic segmentation to both medical devices and anatomy in robot-assisted surgery.
  • Provide a challenging, variable dataset with realistic instrument motion and background tissue.
  • Benchmark diverse deep learning architectures on pixel-wise endoscopic scene segmentation.
  • Highlight labeling challenges such as 'covered kidney' to reflect surgical realism.

提案手法

  • Multiple leading CNN architectures were submitted, including ResNeXt-101 with Squeeze-Excitation blocks, U-Net with VGG-19 encoder, and DeepLab V3+ variants.
  • Approaches used various encoders/decoders (ResNet, VGG, Xception, PSPNet, GCN) with data augmentations and class-specific loss objectives.
  • Two-branch or multi-task strategies were explored (separate networks for instruments vs. organs, ensemble methods, and post-processing like CRF).
  • Evaluation relied on mean intersection over union (IoU) computed per frame and averaged over frames and datasets.
  • Datasets were annotated to include medical devices (instruments, ultrasound probes, clips) and anatomical classes (kidney parenchyma, covered kidney, small intestine) with a background class.

実験結果

リサーチクエスチョン

  • RQ1What is the segmentation performance (mean IoU) of state-of-the-art models on the EndoVis 2018 segmentation challenge across instrument and anatomical classes?
  • RQ2Which architectures and data augmentation strategies yield the best mean IoU across diverse test scenarios?
  • RQ3How do anatomical labeling challenges (e.g., covered kidney) and tissue occlusions affect segmentation performance?
  • RQ4How does model performance vary across four distinct test datasets that simulate different surgical views and occlusions?

主な発見

  • Test Dataset 1 shows average IoU across methods around 0.5–0.67 for key classes, with some teams scoring near 0.9 for specific classes like kidney parenchyma.
  • Test Dataset 2 reports overall averages around 0.45–0.48, with instrument and parenchyma classes generally better predicted than heavily occluded labels like covered kidney.
  • Test Dataset 3 achieves higher averages around 0.65–0.70 for several methods, but kidney-facing surfaces remain challenging when heavily covered.
  • Test Dataset 4 yields lower overall averages (~0.28–0.38), reflecting strong occlusion and complex backgrounds, with kidney surface often the hardest class.
  • Overall cross-dataset average (Table V) indicates methods outperform on some datasets and underperform on occluded/complex scenes, with aggregated average IoU around 0.478.
  • Multiple teams dominated by OTH Regensburg, NCT, IRCAD across datasets, indicating consensus on strong architectures like DeepLab-based and encoder-decoder models.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。