QUICK REVIEW

[논문 리뷰] SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge

Dimitrios Psychogyios, Emanuele Colleoni|arXiv (Cornell University)|2023. 12. 31.

Surgical Simulation and Training인용 수 10

한 줄 요약

본 논문은 Robotic-Assisted Radical Prostatectomy(RARP) 중 수술 액션 인식과 기구의 의미론적 분할을 위한 멀티모달 공개 인-vivo 데이터셋 SAR-RARP50를 제시하며, 단일 작업과 다중 작업 학습 접근법을 탐구하는 챌런지를 포함한다.

ABSTRACT

Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segmentation algorithms are often trained and make predictions in isolation from each other, without exploiting potential cross-task relationships. With the EndoVis 2022 SAR-RARP50 challenge, we release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP). The aim of the challenge is twofold. First, to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain. Second, to further explore the potential of multitask-based learning approaches and determine their comparative advantage against their single-task counterparts. A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation. The complete SAR-RARP50 dataset is available at: https://rdr.ucl.ac.uk/projects/SARRARP50_Segmentation_of_surgical_instrumentation_and_Action_Recognition_on_Robot-Assisted_Radical_Prostatectomy_Challenge/191091

연구 동기 및 목표

실제 다센터 로봇 수술 데이터에서 강건한 액션 인식 및 기구 분할을 촉진한다.
현실적인 평가를 위해 다양한 조명, 가림, 혈액을 포착한 크고 라벨링된 in-vivo 데이터셋을 제공한다.
상관된 작업에서 단일 작업과 다중 작업 학습 접근법의 평가를 가능하게 한다.
교차 작업 관계를 활용하여 예측 정확도를 향상시키는 방법의 개발을 촉진한다.

제안 방법

로봇 보조 근치전립선절제술에서의 50개의 봉합 비디오 세그먼트를 액션 및 세그먼테이션 라벨과 함께 포함하는 멀티모달 데이터셋을 공개한다.
두 가지 작업(액션 인식 및 의미론적 기구 세그먼테이션)을 정의하고 공유 표현을 사용한 이를 결합한 다중 작업 설정을 정의한다.
평가 지표를 확립한다: 프레임 단위 정확도 및 액션 인식을 위한 세그먼트 F1@K; 분할을 위한 mIoU 및 NSD; 그리고 결합된 다중 작업 점수.
단일 작업 및 다중 작업 딥러닝 접근법을 적용하는 다수 팀의 제출물을 요청하고 분석한다.
제출된 방법의 기본선 및 아키텍처 선택에 대한 포괄적 설명을 제공한다.

실험 결과

연구 질문

RQ1상관된 작업에서 세그먼테이션 정보를 활용한 다중 작업 학습이 실제 수술 영상에서 액션 인식을 개선할 수 있는가?
RQ2현실 세계의 인-vivo RARP 데이터에서 최첨단 단일 작업 모델은 컨트롤된 데이터 세트에서의 학습과 비교하여 어떤 성능을 보이는가?
RQ3멀티모달 정보와 시간적 일관성이 세그먼테이션 및 액션 라벨링 정확도에 어떤 영향을 미치는가?
RQ4액션 큐 및 기구 외형 사이의 교차 작업 관계가 단일 작업 기준선 대비 측정 가능한 이득을 제공하는가?

주요 결과

Twelve teams participated in SAR-RARP50, contributing seven action recognition methods, nine instrument segmentation techniques, and four multitask approaches.
The dataset comprises 50 suturing video segments (DVC suturing) with 1 Hz segmentation masks for instruments and frame-rate action annotations, capturing diverse real-world conditions.
The challenge demonstrated the feasibility and value of multitask learning by integrating action and instrument segmentation tasks.
Participants explored transformer-, CNN-, and hybrid-based architectures, with various cross-task leveraging strategies and test-time augmentations.
The dataset and challenge establish a benchmark for in-vivo robotic surgery understanding, highlighting cross-task benefits and limitations in real-world variabilities.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.