QUICK REVIEW

[论文解读] SRVAU-R1: Enhancing Video Anomaly Understanding via Reflection-Aware Learning

Zihao Zhao, Shengting Cao|arXiv (Cornell University)|Feb 1, 2026

Anomaly Detection Techniques and Applications被引用 0

一句话总结

SRVAU-R1 通过构建反思导向的数据管道和两阶段学习框架（SFT 与 RFT），实现对视频异常的反思感知学习，提升多模态大模型在 VAU 任务中的自我反思与纠错能力。

ABSTRACT

Multi-modal large language models (MLLMs) have demonstrated significant progress in reasoning capabilities and shown promising effectiveness in video anomaly understanding (VAU) tasks. However, existing MLLM-based approaches remain largely focused on surface-level descriptions of anomalies, lacking deep reasoning over abnormal behaviors like explicit self-reflection and self-correction. To address that, we propose Self-Reflection-Enhanced Reasoning for Video Anomaly Understanding (SRVAU-R1), a reflection-aware learning framework that incorporates reflection in MLLM reasoning. Specifically, SRVAU-R1 introduces the first reflection-oriented Chain-of-Thought dataset tailored for VAU, providing structured supervision with initial reasoning, self-reflection, and revised reasoning. Based on that, it includes a novel reflection-aware learning paradigm with supervised fine-tuning and reinforcement fine-tuning to enhance multi-modal reasoning for VAU. Extensive experiments on multiple video anomaly benchmarks demonstrate that SRVAU-R1 consistently outperforms existing methods, achieving significant improvements in both temporal anomaly localization accuracy and reasoning quality.

研究动机与目标

推动对复杂异常的深度、具时序基础的理解，超越表层描述。
使多模态大模型在 VAU 任务中具备显式的自我反思与自我纠错能力。
提供面向反思的 Chain-of-Thought 数据集和监督信号。
开发两阶段训练范式（先 SFT 再 RFT）以提升推理质量与鲁棒性。

提出的方法

构建一个反思导向的数据构建管线，加入初始推理、自我反思和修正后的推理信号。
创建一个专为反思增强 VAU 设计的 Chain-of-Thought 训练数据集。
采用两阶段学习范式：基于 GRPO 的反思增强监督微调（SFT）随后进行反思感知强化微调（RFT）。
设计一个综合奖励用于 RFT，包含任务准确度、反思质量和时间 IoU（tIoU）分量。
引入时间 IoU 奖励以使时间推理与真实异常区间对齐。

实验结果

研究问题

RQ1显式自我反思如何提升 VAU 的推理质量与时间定位？
RQ2反思导向的数据集和两阶段训练是否在不同数据集上带来鲁棒、可泛化的 VAU 性能？
RQ3反思数据规模与教师模型对 VAU-R1 表现有何影响？
RQ4基于 GRPO 的反思感知强化学习相较于基线在 VAU 任务中的表现如何？

主要发现

SRVAU-R1 在 MSAD 与 UCF-Crime 上显著提升问答准确率和 VAU-Eval 得分，相较基线有稳定提升。
SRVAU-R1 相对于基线在时间异常定位方面表现更佳（mIoU 与召回率更高），在 OOD 设置下对 ECVA 与 MSAD 的改进尤为显著。
消融实验显示反思数据与两阶段 SFT+RFT 至关重要；移除反思数据会显著降低性能。
面向反思的训练使模型出现明显的“恍然大悟”时刻，能够修正初始推理以提升定位与因果理解。
没有显式反思标记的两步思考收益有限，凸显了显式自我反思对于鲁棒 VAU 的必要性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。