QUICK REVIEW

[论文解读] Self-Supervised Surgical Tool Segmentation using Kinematic Information

Cristian da Costa Rocha, Nicolas Padoy|arXiv (Cornell University)|Feb 13, 2019

Soft Robotics and Applications参考文献 37被引用 42

一句话总结

本論文提出 SSTS，一種自我監督方法，利用機器人的運動學模型為基於 FCN 的外科工具分割生成訓練標籤，在幾乎無需人工標註的情況下實現接近全監督的性能。

ABSTRACT

Surgical tool segmentation in endoscopic images is the first step towards pose estimation and (sub-)task automation in challenging minimally invasive surgical operations. While many approaches in the literature have shown great results using modern machine learning methods such as convolutional neural networks, the main bottleneck lies in the acquisition of a large number of manually-annotated images for efficient learning. This is especially true in surgical context, where patient-to-patient differences impede the overall generalizability. In order to cope with this lack of annotated data, we propose a self-supervised approach in a robot-assisted context. To our knowledge, the proposed approach is the first to make use of the kinematic model of the robot in order to generate training labels. The core contribution of the paper is to propose an optimization method to obtain good labels for training despite an unknown hand-eye calibration and an imprecise kinematic model. The labels can subsequently be used for fine-tuning a fully-convolutional neural network for pixel-wise classification. As a result, the tool can be segmented in the endoscopic images without needing a single manually-annotated image. Experimental results on phantom and in vivo datasets obtained using a flexible robotized endoscopy system are very promising.

研究动机与目标

通過利用機器人運動學作為標籤信號來解決外科手術工具分割中標註數據不足的問題。
開發一種方法，即使存在運動學/模型誤差，也能估計出有用的手眼變換。
在線微調一個輕量級的 FCN，以自生成的標籤進行逐像素分割。
在 phantom 和 in vivo 的內視鏡數據集上驗證該方法，使用靈活的連續機器人。

提出的方法

基於模型的標籤生成：使用 transformation T 將機器人及估計形狀投影到影像中，以獲得投影標籤 y(q, T)。
Grabcut 基於的優化：通過在 SE3 上對 T 進行隨機分支與界搜索，最大化 Grabcut 輸出與投影標籤之間的 F'1 分數。
兩步工作流程：(i) 計算 T* 以使模型投影與影像觀察對齊，(ii) 使用得到的投影訓練 Fully Convolutional Network (FCN) 進行逐像素分割。
FCN 架構：以 ResNet18 為骨幹，具有兩條上采樣路徑以產生逐像素分數，使用加權交叉熵損失和 L2 正則化訓練。
在線微調：執行數據增強和端到端訓練以使 FCN 適應特定的手術和成像條件。
後處理：應用 Conditional Random Fields 來細化 FCN 的分割輸出。

实验结果

研究问题

RQ1利用機器人運動學模型的自我監督方法是否能在無需人工標註的情況下為外科工具分割產生可靠標籤？
RQ2在存在運動學與校準誤差的情況下，如何有效地使用 Grabcut 基於成本函數優化手眼變換？
RQ3使用自生成標籤對 FCN micro td微調是否能在 phantom 和 in vivo 數據上接近全監督學習的性能？
RQ4內窺鏡領域預訓練對在 vivo 情景下的分割性能有何影響？

主要发现

Dataset	Approach	Accuracy	IoU	Recall	Precision
Phantom 1	SSTS	0.99	0.86	0.90	0.92
Phantom 1	FSL	0.99	0.87	0.92	0.93
Phantom 1	Grabcut	0.97	0.56	0.86	0.61
Phantom 2	SSTS	0.98	0.78	0.88	0.87
Phantom 2	FSL	0.98	0.84	0.88	0.94
Phantom 2	Grabcut	0.95	0.49	0.66	0.66
In Vivo	SSTS	0.97	0.62	0.66	0.91
In Vivo	FSL	0.98	0.72	0.73	0.98
In Vivo	Grabcut	0.96	0.55	0.73	0.69

使用 Grabcut 基於成本的 T* 優化與 GT 的 IoU 在 phantom 與 in vivo 數據集上呈相關性，實現有意義的標籤而無需真實標註。
SSTS 的性能在 phantom 1、phantom 2 和 in vivo 數據集上接近全監督學習，具備相似的 IoU、召回和精度指標。
在 phantom 1 上，SSTS 實現 0.99 的準確度和 0.86 的 IoU，接近 FSL 的 0.99 準確度與 0.87 的 IoU。
在 phantom 2 上，SSTS 實現 0.98 的準確度和 0.78 的 IoU，接近 FSL 的 0.98 準確度與 0.84 的 IoU。
在 in vivo 數據上，SSTS 實現 0.97 的準確度與 0.62 的 IoU，而 FSL 為 0.98 準確度與 0.72 的 IoU；Grabcut 基準的 IoU 顯著較低。
內窺鏡域的微調相較於 ImageNet 預訓練提升了 ROC 表現，突顯對於內窺鏡資料的域特定預訓練的好處。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。