QUICK REVIEW

[论文解读] 2018 Robotic Scene Segmentation Challenge

Max Allan, Satoshi Kondo|arXiv (Cornell University)|Jan 30, 2020

Surgical Simulation and Training参考文献 17被引用 118

一句话总结

本文介绍了 2018 年 EndoVis 手术机器人场景分割挑战赛，提出解剖学和医疗设备类别、19 条猪内镜序列，以及一个多团队基准，通过四个测试数据集的平均 IoU 进行评估。

ABSTRACT

In 2015 we began a sub-challenge at the EndoVis workshop at MICCAI in Munich using endoscope images of ex-vivo tissue with automatically generated annotations from robot forward kinematics and instrument CAD models. However, the limited background variation and simple motion rendered the dataset uninformative in learning about which techniques would be suitable for segmentation in real surgery. In 2017, at the same workshop in Quebec we introduced the robotic instrument segmentation dataset with 10 teams participating in the challenge to perform binary, articulating parts and type segmentation of da Vinci instruments. This challenge included realistic instrument motion and more complex porcine tissue as background and was widely addressed with modifications on U-Nets and other popular CNN architectures. In 2018 we added to the complexity by introducing a set of anatomical objects and medical devices to the segmented classes. To avoid over-complicating the challenge, we continued with porcine data which is dramatically simpler than human tissue due to the lack of fatty tissue occluding many organs.

研究动机与目标

将语义分割扩展到机器人辅助手术中的医疗设备和解剖结构。
提供具有现实手术器械运动和背景组织的具有挑战性、变量化的数据集。
基准测试多种深度学习架构在像素级内镜场景分割上的表现。
突出标注挑战，如 'covered kidney'，以反映手术现实性。

提出的方法

提交了多种领先的 CNN 架构，包括带有 squeeze-Excitation 块的 ResNeXt-101、带 VGG-19 编码器的 U-Net，以及 DeepLab V3+ 的变体。
方法使用了多种编码器/解码器（ResNet、VGG、Xception、PSPNet、GCN），结合数据增强和类别特定的损失目标。
探索了双分支或多任务策略（器械与器官的分离网络、集成方法，以及如 CRF 的后处理）。
评估依据逐帧计算的平均交并比 (IoU)，并在帧和数据集上进行求平均。
数据集注释包括医疗设备（器械、超声探头、夹子）和解剖类别（肾实质、覆盖肾、小肠）以及背景类。

实验结果

研究问题

RQ1在 EndoVis 2018 分割挑战中，最先进模型在器械与解剖类上的分割性能（平均 IoU）是多少？
RQ2哪些架构和数据增强策略在多样化的测试场景中获得最佳的平均 IoU？
RQ3解剖标注挑战（如 covered kidney）以及组织遮挡如何影响分割性能？
RQ4模型在模拟不同手术视角和遮挡的四个不同测试数据集上的性能如何变化？

主要发现

测试数据集 1 显示方法在关键类别上的平均 IoU 约为 0.5–0.67，某些团队在特定类别（如肾实质）接近 0.9。
测试数据集 2 的总体平均约为 0.45–0.48，器械和实质类别通常比被严重遮挡的如 covered kidney 的标签预测得更好。
测试数据集 3 对多种方法实现了约 0.65–0.70 的更高平均值，但当肾脏暴露表面被大量遮挡时仍然具有挑战性。
测试数据集 4 的总体平均较低（约 0.28–0.38），反映强遮挡和复杂背景，肾表面通常是最难的类别。
总体跨数据集平均值（表 V）表明某些数据集上方法表现优于其他数据集，而在遮挡/复杂场景下表现不佳，聚合平均 IoU 约为 0.478。
在各数据集中，多支队伍由 OTH Regensburg、NCT、IRCAD 主导，表明对强大架构如基于 DeepLab 的模型和编码器-解码器模型有共识。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。