QUICK REVIEW

[論文レビュー] nnInteractive: Redefining 3D Promptable Segmentation

Fabian Isensee, Maximilian Rokuss|ArXiv.org|Mar 11, 2025

Computer Graphics and Visualization Techniques被引用数 6

ひとこと要約

nnInteractiveは、さまざまな2Dプロンプト（点、落書き、ボックス、ラッソ）を完全な3Dセグメンテーションへ変換する3Dのインタラクティブなオープンセット分割フレームワークで、120以上のマルチモーダルデータセットで訓練され、実世界での使用のためにNapariとMITKに統合されています。

ABSTRACT

Accurate and efficient 3D segmentation is essential for both clinical and research applications. While foundation models like SAM have revolutionized interactive segmentation, their 2D design and domain shift limitations make them ill-suited for 3D medical images. Current adaptations address some of these challenges but remain limited, either lacking volumetric awareness, offering restricted interactivity, or supporting only a small set of structures and modalities. Usability also remains a challenge, as current tools are rarely integrated into established imaging platforms and often rely on cumbersome web-based interfaces with restricted functionality. We introduce nnInteractive, the first comprehensive 3D interactive open-set segmentation method. It supports diverse prompts-including points, scribbles, boxes, and a novel lasso prompt-while leveraging intuitive 2D interactions to generate full 3D segmentations. Trained on 120+ diverse volumetric 3D datasets (CT, MRI, PET, 3D Microscopy, etc.), nnInteractive sets a new state-of-the-art in accuracy, adaptability, and usability. Crucially, it is the first method integrated into widely used image viewers (e.g., Napari, MITK), ensuring broad accessibility for real-world clinical and research applications. Extensive benchmarking demonstrates that nnInteractive far surpasses existing methods, setting a new standard for AI-driven interactive 3D segmentation. nnInteractive is publicly available: https://github.com/MIC-DKFZ/napari-nninteractive (Napari plugin), https://www.mitk.org/MITK-nnInteractive (MITK integration), https://github.com/MIC-DKFZ/nnInteractive (Python backend).

研究の動機と目的

モダリティと構造を横断する、正確で柔軟な3Dインタラクティブ分割のニーズに対応する。
点、落書き、ボックス、ラッソを含む多様なプロンプト体系を提供し、3D分割を誘導する。
確立された画像処理プラットフォームとの統合を通じて実世界での利用可能性を確保する。
大規模なマルチモーダル訓練データを活用して一般化とオープンセット能力を向上させる。

提案手法

early prompt integrationを追加の入力チャネルとして採用したUNetベースのnnU-Netバックボーンを採用する。
高解像度でプロンプтを促すことで、2Dプロンプト（点、落書き、ボックス、ラッソ）を3Dマスクへ変換する。
訓練中に広範なプロンプト生成とインタラクションシミュレーションパイプラインを実装（2Dスライスサンプリング、誤差領域の特定、複数タイプのプロンプトを含む）。
VRAM制約内でROIを適応的に拡張し大規模構造を精錬するAuto Zoomメカニズムを導入する。
CT、MRI、PET、3D顕微鏡などを含む120以上のデータセットから64,518体積を訓練し、多様性を高めるためにSuperVoxels由来の疑似ラベルを活用する。
シミュレートされたユーザーエージェント（Random、Sunk Cost、Single Interaction）を用いて現実的なユーザーインタラクションパターンをモデル化する。

実験結果

リサーチクエスチョン

RQ1複数のモダリティと構造に跨る広範なオープンセットモデルで、3Dインタラクティブ分割を効果的に達成できるか。
RQ2点、落書き、ボックス、ラッソなどの多様なプロンプトタイプとシミュレートされたユーザーインタラクションは、3D分割の性能と使いやすさを向上させるか。
RQ3AutoZoomと多回のプロンプト付与は、正確性を維持または向上させつつ注釈作業を低減できるか。
RQ4大規模でマルチモーダルなデータセットで訓練することは、未知のモダリティやタスクへの一般化を改善できるか。

主な発見

左心室	右心室	心筋	平均
66.08	90.04	86.82	80.98	すべてのスライス ScribblePrompt
78.86	92.93	90.07	87.29	すべてのスライス nnInteractive
74.40	91.24	87.33	84.29	3スライス nnInteractive

nnInteractiveは、テストデータ上のプロンプトスタイルを問わず一貫して最先端のベースラインを上回る。
ラッソプロンプトは全体として最も強い指示信号を提供し、最良のDice性能を達成する。
専門家の落書きベンチマークでnnInteractiveはScribblePromptを上回り、注釈を少なくとも少なくとも高精度を達成する。
AutoZoomは大規模オブジェクトの分割を改善し、収束に必要な反復回数を削減する。
推論時間は実用的な用途と互換性があり（≤10 GB VRAM、小オブジェクトで120–200 ms；AutoZoomを用い大型オブジェクトでは最大1160 ms）、臨床現場での利用に適する。
放射線タスクにおいて、nnInteractiveは専門家レベルの性能を達成し、注釈時間を大幅に短縮する（179±114s vs 635±343s）。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。