[Paper Review] Topo-R1: Detecting Topological Anomalies via Vision-Language Models
Topo-R1 introduces a topology-aware, vision-language framework for detecting and classifying topological errors in tubular structures using reinforcement learning with a specialized composite reward and an automated, multi-domain anomaly-injection benchmark.
Topological correctness is crucial for tubular structures such as blood vessels, nerve fibers, and road networks. Existing topology-preserving methods rely on domain-specific ground truth, which is costly and rarely transfers across domains. When deployed to a new domain without annotations, a key question arises: how can we detect topological anomalies without ground-truth supervision? We reframe this as topological anomaly detection, a structured visual reasoning task requiring a model to locate and classify topological errors in predicted segmentation masks. Vision-Language Models (VLMs) are natural candidates; however, we find that state-of-the-art VLMs perform nearly at random, lacking the fine-grained, topology-aware perception needed to identify sparse connectivity errors in dense structures. To bridge this gap, we develop an automated data-curation pipeline that synthesizes diverse topological anomalies with verifiable annotations across progressively difficult levels, thereby constructing the first large-scale, multi-domain benchmark for this task. We then introduce Topo-R1, a framework that endows VLMs with topology-aware perception via two-stage training: supervised fine-tuning followed by reinforcement learning with Group Relative Policy Optimization (GRPO). Central to our approach is a topology-aware composite reward that integrates type-aware Hungarian matching for structured error classification, spatial localization scoring, and a centerline Dice (clDice) reward that directly penalizes connectivity disruptions, thereby jointly incentivizing semantic precision and structural fidelity. Extensive experiments demonstrate that Topo-R1 establishes a new paradigm for annotation-free topological quality assessment, consistently outperforming general-purpose VLMs and supervised baselines across all evaluation protocols.
Motivation & Objective
- Motivate annotation-free detection of topological errors in segmentation masks across domains (e.g., vessels, roads).
- Develop a topology-aware perception framework to locate and classify structural errors in tubular networks.
- Create an automated data-curation pipeline that injects verifiable topological anomalies for multi-domain training and benchmarking.
Proposed method
- Frame topological anomaly detection as structured visual reasoning with typed bounding-box outputs.
- Two-stage training: supervised fine-tuning (SFT) followed by reinforcement learning with Group Relative Policy Optimization (GRPO).
- Design a topology-aware composite reward combining: (i) type-aware Hungarian matching for error classification; (ii) spatial localization scoring; (iii) a centerline Dice (clDice) based reward to emphasize connectivity preservation.
- Automated data-curation pipeline that injects four anomaly types (broken/spurious connections, missing/extra branches) into multi-domain crops and verifies changes via Betti numbers.
- Use a type-aware, within-group Hungarian matching to assign predictions to ground truths before computing rewards.
- Evaluate across zero-shot, SFT-only, and Topo-R1 settings on multiple backbone VLMs and baselines.
Experimental results
Research questions
- RQ1Can vision-language models be endowed with topology-aware perception for detecting sparse, connectivity-based errors in tubular structures without ground-truth supervision?
- RQ2Does a two-stage training (SFT + GRPO) with a topology-specific composite reward improve detection and classification of topological anomalies across domains?
- RQ3How does automated, cross-domain data synthesis with topology verification impact generalization to new domains?
- RQ4What is the impact of type-aware matching and clDice-based rewards on localization and error-type classification performance?
Key findings
- Zero-shot VLMs perform near random on topological anomaly detection.
- Supervised fine-tuning provides foundational gains by teaching anomaly taxonomy and basic localization.
- Topology-aware reinforcement learning (GRPO) with the composite reward yields consistent gains over SFT across backbones, especially in precision.
- Topo-R1 with the Qwen3-VL-4B backbone achieves up to 45.2% F1@0.5, outperforming baselines and closed-source models under similar evaluations.
- An ablation study shows the non-linear, tiered reward and type-aware matching significantly outperform raw IoU rewards and linear thresholdings in F1 across IoU levels.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.