QUICK REVIEW

[論文レビュー] SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge

Dimitrios Psychogyios, Emanuele Colleoni|arXiv (Cornell University)|Dec 31, 2023

Surgical Simulation and Training被引用数 10

ひとこと要約

本論文は SAR-RARP50 を提示する。これは Robotic-Assisted Radical Prostatectomy（ロボット支援前立腺全摘手術）中の手術動作認識と器具の意味的セグメンテーションのためのマルチモーダルで公開されている in-vivo データセットであり、単一タスク学習とマルチタスク学習アプローチを探索するチャレンジを含む。

ABSTRACT

Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segmentation algorithms are often trained and make predictions in isolation from each other, without exploiting potential cross-task relationships. With the EndoVis 2022 SAR-RARP50 challenge, we release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP). The aim of the challenge is twofold. First, to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain. Second, to further explore the potential of multitask-based learning approaches and determine their comparative advantage against their single-task counterparts. A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation. The complete SAR-RARP50 dataset is available at: https://rdr.ucl.ac.uk/projects/SARRARP50_Segmentation_of_surgical_instrumentation_and_Action_Recognition_on_Robot-Assisted_Radical_Prostatectomy_Challenge/191091

研究の動機と目的

現実世界の多施設ロボット手術データにおける堅牢な動作認識と器具セグメンテーションを動機づける。
現実的な評価のために照明、遮蔽、出血を多様に捉えた大規模でラベル付きの in-vivo データセットを提供する。
相関するタスクに対して単一タスク学習とマルチタスク学習アプローチの評価を可能にする。
クロスタスクの関係を活用して予測精度を向上させる手法の開発を促進する。

提案手法

Robotic-Assisted Radical Prostatectomy からの 50 個の縫合ビデオセグメントを、動作認識とセグメンテーションラベルと共に公開する。
2 つのタスク（動作認識と意味的器具セグメンテーション）を定義し、共有表現を用いた両者を組み合わせたマルチタスク設定を採用する。
評価指標を確立する：動作認識にはフレームごとの精度とセグメント別 F1@K；セグメンテーションには mIoU と NSD；および結合マルチタスクスコア。
単一タスクとマルチタスク深層学習アプローチを適用する複数チームからの提出を募集・分析する。
提出された手法のベースラインと建築的選択の包括的な記述を提供する。

実験結果

リサーチクエスチョン

RQ1分割情報を活用したマルチタスク学習により、現実の手術動画での動作認識を改善できるか。
RQ2制御データセットでの学習と比較して、現実世界のin-vivo RARP データ上で最先端の単一タスクモデルはどの程度性能を示すか。
RQ3マルチモーダル情報と時間的一貫性がセグメンテーションと動作ラベリングの精度に与える影響は何か。
RQ4動作キューと器具外観とのクロスタスク関係は、単一タスクのベースラインより測定可能な改善を生むか。

主な発見

SAR-RARP50 には 12 チームが参加し、7 件の動作認識手法、9 件の器具セグメンテーション技術、4 件のマルチタスクアプローチを提供した。
データセットは 50 個の縫合ビデオセグメント（DVC 縫合）で構成され、器具の 1 Hz セグメンテーションマスクとフレームレートの動作アノテーションを多様な実世界条件で捉える。
チャレンジは動作と器具セグメンテーションタスクを統合することで、マルチタスク学習の実現性と価値を示した。
参加者はトランスフォーマー、CNN、ハイブリッドベースのアーキテクチャを検討し、さまざまなクロスタスク活用戦略とテスト時データ拡張を試みた。
データセットとチャレンジは、in-vivo ロボット手術理解のベンチマークを確立し、現実世界のばらつきにおけるクロスタスクの利点と限界を浮き彫りにしている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。