QUICK REVIEW

[论文解读] SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge

Dimitrios Psychogyios, Emanuele Colleoni|arXiv (Cornell University)|Dec 31, 2023

Surgical Simulation and Training被引用 10

一句话总结

本文介绍 SAR-RARP50，一种模态多的、公开可获得的体内数据集，用于手术动作识别和在机器人辅助前列腺切除术中器械的语义分割，同时还包括一个探讨单任务与多任务学习方法的挑战。

ABSTRACT

Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segmentation algorithms are often trained and make predictions in isolation from each other, without exploiting potential cross-task relationships. With the EndoVis 2022 SAR-RARP50 challenge, we release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP). The aim of the challenge is twofold. First, to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain. Second, to further explore the potential of multitask-based learning approaches and determine their comparative advantage against their single-task counterparts. A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation. The complete SAR-RARP50 dataset is available at: https://rdr.ucl.ac.uk/projects/SARRARP50_Segmentation_of_surgical_instrumentation_and_Action_Recognition_on_Robot-Assisted_Radical_Prostatectomy_Challenge/191091

研究动机与目标

在真实世界、多中心的机器人手术数据中推动鲁棒的动作识别和器械分割。
提供一个大型、带标签的体内数据集，捕捉多样化的光照、遮挡和血液情况，以实现现实评估。
使对相关任务上的单任务学习与多任务学习方法进行评估成为可能。
促进开发利用跨任务关系以提升预测精度的方法。

提出的方法

发布来自机器人辅助根治性前列腺切除术的 50 段缝合视频的模态多数据集，附带动作与分割标签。
定义两个任务（动作识别和器械语义分割），并通过共享表示将两者结合起来的多任务设置。
建立评估指标：动作识别的逐帧准确率和分割的段级 F1@K；分割的 mIoU 和 NSD；以及综合的多任务分数。
征集并分析来自多支团队的单任务与多任务深度学习方法的提交。
提供基线和提交方法及其体系结构选择的全面描述。

实验结果

研究问题

RQ1在真实手术视频中，是否通过利用分割信息的多任务学习可以提升动作识别？
RQ2与在受控数据集上的训练相比，最先进的单任务模型在真实世界的体内 RARP 数据上的表现如何？
RQ3多模态信息和时间一致性对分割和动作标注准确性的影响是什么？
RQ4动作线索与器械外观之间的跨任务关系是否能带来相对于单任务基线的可观提升？

主要发现

共有十二支团队参与 SAR-RARP50，贡献了七种动作识别方法、九种器械分割技术，以及四种多任务方法。
数据集包含 50 段缝合视频（DVC 缝合），器械分割掩码为 1 Hz，动作注释为帧率，捕捉了多样的真实世界条件。
该挑战通过整合动作与器械分割任务，展示了多任务学习的可行性与价值。
参与者探索了基于 Transformer、CNN 及混合架构的模型，并采用了多种跨任务利用策略和测试时增强。
该数据集与挑战为体内机器人手术理解建立了基准，突出跨任务的收益与在现实世界变异中的局限性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。