QUICK REVIEW

[论文解读] Seq-NMS for Video Object Detection

Wei Han, Pooya Khorrami|arXiv (Cornell University)|Feb 26, 2016

Advanced Neural Network Applications参考文献 8被引用 154

一句话总结

一两句直接回答摘要：Seq-NMS 通过利用时间一致性扩展对单图检测器的后处理，在邻近帧之间提升较弱的检测，在 ILSVRC-2015 的 ImageNet VID 上实现了接近最先进水平的增益。

ABSTRACT

Video object detection is challenging because objects that are easily detected in one frame may be difficult to detect in another frame within the same clip. Recently, there have been major advances for doing object detection in a single image. These methods typically contain three phases: (i) object proposal generation (ii) object classification and (iii) post-processing. We propose a modification of the post-processing phase that uses high-scoring object detections from nearby frames to boost scores of weaker detections within the same clip. We show that our method obtains superior results to state-of-the-art single image object detection techniques. Our method placed 3rd in the video object detection (VID) task of the ImageNet Large Scale Visual Recognition Challenge 2015 (ILSVRC2015).

研究动机与目标

Motivate improvements to video object detection by leveraging temporal information beyond per-frame processing.
Extend single-image detectors with a post-processing step that aggregates information across adjacent frames.
Demonstrate that Seq-NMS improves mAP on the ImageNet VID dataset and competitive ranking in ILSVRC-2015 VID task.

提出的方法

Construct a video-wide proposal set with region proposals and scores for all frames in a clip.
Build a frame-to-frame IoU graph (IoU > 0.5) to form sequences across adjacent frames.
Use dynamic programming to select the highest-scoring sequence of boxes across the clip.
Re-score the selected sequence via a function F (average or max) across its scores.
Suppress non-selected boxes in the same frames and along the sequence to reduce redundancy.
Repeat the process until no sequences remain.

实验结果

研究问题

RQ1Does incorporating temporal information in post-processing improve detection performance over frame-wise NMS in video object detection?
RQ2How much can Seq-NMS improve mAP on ImageNet VID across different base networks (ZF vs VGG)?
RQ3What are the strengths and failure modes of Seq-NMS in scenarios with occlusion, scale changes, or similar objects close together?

主要发现

Seq-NMS improves mAP over single-image NMS on ImageNet VID when combined with strong detectors like VGG nets.
Seq-NMS (avg) yields 51.5% mAP on initial val and 51.4% on full val with VGG net, outperforming NMS alone.
Seq-NMS (best) achieves 53.6% mAP on initial val and 52.2% on full val with VGG net; on full test, best reaches 48.2–48.7% depending on setup.
Seq-NMS (avg) on full val reached 51.4% mAP, and Seq-NMS (best) reached 52.2% on full val; the method also shows substantial gains for several classes (e.g., motorcycle, turtle, red panda, lizard, sheep).
The method achieved 3rd place in the ImageNet VID task at ILSVRC-2015.
When compared to top methods, Seq-NMS-based temporal information provides favorable gains versus trajectory/propagation methods in their setup.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。