QUICK REVIEW

[논문 리뷰] Topo-R1: Detecting Topological Anomalies via Vision-Language Models

Meilong Xu, Qingqiao Hu|arXiv (Cornell University)|2026. 03. 13.

Topological and Geometric Data Analysis인용 수 0

한 줄 요약

Topo-R1은 특수한 합성 보상과 자동화된 다도메인 이상 주입 벤치마크를 활용하여 관형 구조의 위상 오류를 탐지하고 분류하기 위한 토폴로지 인식 비전-언어 프레임워크를 도입합니다.

ABSTRACT

Topological correctness is crucial for tubular structures such as blood vessels, nerve fibers, and road networks. Existing topology-preserving methods rely on domain-specific ground truth, which is costly and rarely transfers across domains. When deployed to a new domain without annotations, a key question arises: how can we detect topological anomalies without ground-truth supervision? We reframe this as topological anomaly detection, a structured visual reasoning task requiring a model to locate and classify topological errors in predicted segmentation masks. Vision-Language Models (VLMs) are natural candidates; however, we find that state-of-the-art VLMs perform nearly at random, lacking the fine-grained, topology-aware perception needed to identify sparse connectivity errors in dense structures. To bridge this gap, we develop an automated data-curation pipeline that synthesizes diverse topological anomalies with verifiable annotations across progressively difficult levels, thereby constructing the first large-scale, multi-domain benchmark for this task. We then introduce Topo-R1, a framework that endows VLMs with topology-aware perception via two-stage training: supervised fine-tuning followed by reinforcement learning with Group Relative Policy Optimization (GRPO). Central to our approach is a topology-aware composite reward that integrates type-aware Hungarian matching for structured error classification, spatial localization scoring, and a centerline Dice (clDice) reward that directly penalizes connectivity disruptions, thereby jointly incentivizing semantic precision and structural fidelity. Extensive experiments demonstrate that Topo-R1 establishes a new paradigm for annotation-free topological quality assessment, consistently outperforming general-purpose VLMs and supervised baselines across all evaluation protocols.

연구 동기 및 목표

주석 없이 도메인 간 분할 마스크의 위상 오류를 탐지하도록 동기를 부여합니다.
관형 네트워크의 구조적 오류를 위치시키고 분류하기 위한 토폴로지 인식 프레임워크를 개발합니다.
다도메인 학습 및 벤치마크를 위한 검증 가능한 위상 이상을 주입하는 자동화된 데이터 큐레이션 파이프라인을 만듭니다.

제안 방법

위상 이상 탐지를 타입이 지정된 바운딩 박스 출력이 있는 구조화된 시각적 추론으로 프레이밍합니다.
두 단계 학습: 감독 미세 조정(SFT) 후에 Group Relative Policy Optimization(GRPO)과 함께 강화 학습.
토폴로지 인식형 합성 보상을 설계하여: (i) 오류 분류를 위한 타입 인식 헝가리 매칭; (ii) 공간 위치화 점수화; (iii) 연결성 보존 강화를 위한 센터라인 Dice (clDice) 기반 보상을 결합합니다.
다도메인 크롭에 네 가지 이상 유형(손상된/허위 연결, 누락된/추가 분지)을 주입하고 Betti 수를 통해 변화를 검증하는 자동화된 데이터 큐레이션 파이프라인.
보상을 계산하기 전에 예측을 정답에 할당하기 위해 그룹 내에서 타입 인식 헝가리 매칭을 사용합니다.
제로샷, SFT-전용, 및 Topo-R1 설정에서 다수의 백본 VLM 및 베이스라인에 대해 평가합니다.

실험 결과

연구 질문

RQ1비전-언어 모델에 토폴로지 인식 지각 능력을 부여하여 토관형 구조에서 희박하고 연결성 기반의 오류를 탐지하기 위해 지도 학습 없이 탐지할 수 있는가?
RQ2토폴로지 특화 합성 보상을 갖춘 두 단계 학습(SFT + GRPO)이 도메인 간 위상 이상 탐지 및 분류를 개선하는가?
RQ3토폴로지 검증이 포함된 자동화된 교차 도메인 데이터 합성이 새로운 도메인으로의 일반화에 어떤 영향을 미치는가?
RQ4타입 인식 매칭과 clDice 기반 보상이 위치 추정 및 오류 유형 분류 성능에 미치는 영향은 무엇인가?

주요 결과

제로샷 VLM은 위상 이상 탐지에서 무작위에 가까운 성능을 보입니다.
감독 미세 조정은 이상 분류법과 기본 위치 추정을 학습시켜 기초적 이득을 제공합니다.
토폴로지 인식 강화 학습(GRPO)은 합성 보상을 통해 백본 전반에서 SFT에 비해 일관된 이득을 주며, 특히 정밀도에서 우수합니다.
Topo-R1은 Qwen3-VL-4B 백본을 사용할 때 최대 45.2% F1@0.5를 달성하여 유사 평가에서 베이스라인 및 폐쇄 소스 모델을 능가합니다.
특히 비선형의 계층화 보상과 타입 인식 매칭이 원시 IoU 보상 및 선형 임계값 설정을 IoU 수준별 F1에서 크게 능가한다는 애블레이션 연구 결과가 있습니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.