QUICK REVIEW

[论文解读] Topo-R1: Detecting Topological Anomalies via Vision-Language Models

Meilong Xu, Qingqiao Hu|arXiv (Cornell University)|Mar 13, 2026

Topological and Geometric Data Analysis被引用 0

一句话总结

Topo-R1 引入一种拓扑感知的视觉-语言框架，用于在管状结构中检测和分类拓扑错误，使用带有专门复合奖赏的强化学习以及一个自动化的多域异常注入基准。

ABSTRACT

Topological correctness is crucial for tubular structures such as blood vessels, nerve fibers, and road networks. Existing topology-preserving methods rely on domain-specific ground truth, which is costly and rarely transfers across domains. When deployed to a new domain without annotations, a key question arises: how can we detect topological anomalies without ground-truth supervision? We reframe this as topological anomaly detection, a structured visual reasoning task requiring a model to locate and classify topological errors in predicted segmentation masks. Vision-Language Models (VLMs) are natural candidates; however, we find that state-of-the-art VLMs perform nearly at random, lacking the fine-grained, topology-aware perception needed to identify sparse connectivity errors in dense structures. To bridge this gap, we develop an automated data-curation pipeline that synthesizes diverse topological anomalies with verifiable annotations across progressively difficult levels, thereby constructing the first large-scale, multi-domain benchmark for this task. We then introduce Topo-R1, a framework that endows VLMs with topology-aware perception via two-stage training: supervised fine-tuning followed by reinforcement learning with Group Relative Policy Optimization (GRPO). Central to our approach is a topology-aware composite reward that integrates type-aware Hungarian matching for structured error classification, spatial localization scoring, and a centerline Dice (clDice) reward that directly penalizes connectivity disruptions, thereby jointly incentivizing semantic precision and structural fidelity. Extensive experiments demonstrate that Topo-R1 establishes a new paradigm for annotation-free topological quality assessment, consistently outperforming general-purpose VLMs and supervised baselines across all evaluation protocols.

研究动机与目标

在跨域（例如血管、道路等）分割掩码中推动无标注注释的拓扑错误检测。
开发一个拓扑感知的感知框架以定位并分类管状网络中的结构错误。
创建一个自动化数据整理管道，在多域训练与基准测试中注入可验证的拓扑异常。

提出的方法

将拓扑异常检测框定为带类型 bounding-box 输出的结构化视觉推理。
两阶段训练：有监督微调（SFT）随后进行基于组相对策略优化（GRPO）的强化学习。
设计一个拓扑感知的复合奖赏，结合：(i) 面向误差分类的类型感知 Hungarian 匹配；(ii) 空间定位评分；(iii) 基于中心线 Dice（clDice）的奖赏以强调连通性保持。
自动化数据整理管道，在多域裁切中注入四种异常类型（断开/虚假连接、缺失/多余分支），并通过 Betti 数验证变化。
在计算奖赏之前，使用类型感知的组内 Hungarian 匹配将预测分配给真值。
在零-shot、仅 SFT 以及 Topo-R1 设置下，对多种骨干 VLMs 和基线进行评估。

实验结果

研究问题

RQ1能否赋予视觉-语言模型拓扑感知的感知能力，在没有地面真值监督的情况下检测管状结构中的稀疏、基于连通性的错误？
RQ2带有拓扑特定复合奖赏的两阶段训练（SFT + GRPO）是否能够在跨域场景中改善拓扑异常的检测和分类？
RQ3自动化的跨域数据合成结合拓扑验证对新领域的泛化有何影响？
RQ4类型感知匹配与基于 clDice 的奖赏对定位和错误类型分类性能有何影响？

主要发现

零-shot 的 VLM 在拓扑异常检测上接近随机表现。
有监督微调通过教授异常分类和基本定位提供基础性提升。
带有复合奖赏的拓扑感知强化学习（GRPO）在各骨干上相对于 SFT 显示持续提升，尤其在精度上。
使用 Qwen3-VL-4B 骨干的 Topo-R1 在相似评估下实现高达 45.2% 的 F1@0.5，优于基线和对比的闭源模型。
一个消融研究显示非线性、分层奖赏和类型感知匹配在不同 IoU 水平的 F1 上显著优于原始 IoU 奖赏和线性阈值化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。