QUICK REVIEW

[论文解读] Cascaded Boundary Regression for Temporal Action Detection

Jiyang Gao, Zhenheng Yang|arXiv (Cornell University)|May 2, 2017

Human Pose and Action Recognition参考文献 16被引用 58

一句话总结

该论文在两阶段时序动作检测流水线中引入 Cascaded Boundary Regression (CBR)，通过迭代细化动作边界，在 THUMOS-14 和 TVSeries 上实现最先进的结果，尤其在较高 IoU 阈值下。

ABSTRACT

Temporal action detection in long videos is an important problem. State-of-the-art methods address this problem by applying action classifiers on sliding windows. Although sliding windows may contain an identifiable portion of the actions, they may not necessarily cover the entire action instance, which would lead to inferior performance. We adapt a two-stage temporal action detection pipeline with Cascaded Boundary Regression (CBR) model. Class-agnostic proposals and specific actions are detected respectively in the first and the second stage. CBR uses temporal coordinate regression to refine the temporal boundaries of the sliding windows. The salient aspect of the refinement process is that, inside each stage, the temporal boundaries are adjusted in a cascaded way by feeding the refined windows back to the system for further boundary refinement. We test CBR on THUMOS-14 and TVSeries, and achieve state-of-the-art performance on both datasets. The performance gain is especially remarkable under high IoU thresholds, e.g. map@tIoU=0.5 on THUMOS-14 is improved from 19.0% to 31.0%.

研究动机与目标

推动对未裁剪视频中的精确时间定位，超越滑动窗口覆盖范围。
提出一个级联边界回归机制，在每个阶段内逐步精细化时间边界。
展示 CBR 对时间动作提议生成和动作检测的有效性。
在 THUMOS-14 和 TVSeries 数据集上评估相对于先前方法的性能。

提出的方法

两阶段动作检测流水线：阶段1 生成与类别无关的时序提议；阶段2 基于提议执行特定类别的检测。
使用 C3D 和双流 CNN 特征进行单位级视频特征提取，并结合上下文增强的片段表示。
使用非参数化的单位级偏移进行时间坐标回归，以细化起始/结束边界。
在每个阶段中进行级联边界回归：将经过改进的片段反复送回同一网络以进一步细化边界（提议阶段 K_p 次，检测阶段 K_d 次）。
多任务损失将分类（提议为二分类，检测为多类）与基于 L1 的边界回归相结合，使用 Adam 在给定超参数下进行优化。
训练样本来自滑动窗口，按 tIoU 标注，使提议网络与检测网络可以分开训练。

实验结果

研究问题

RQ1非参数化单位级时间坐标回归是否在边界细化方面优于参数化和帧级偏移？
RQ2级联边界回归步骤是否相对于单步回归在边界定位和动作检测性能上有所提升？
RQ3与先前方法相比，CBR 在 THUMOS-14 和 TVSeries 的时间提议生成与动作检测上表现如何？
RQ4在使用不同特征类型（C3D 与 two-stream）时，CBR 对定位精度的影响如何？

主要发现

tIoU	Oneata et al. 2014	Yeung et al. 2016	Yuan et al. 2016	S-CNN 2016	CBR-C3D	CBR-TS
0.1	36.6	48.9	51.4	47.7	48.2	60.1
0.2	33.6	44.0	42.6	43.5	44.3	56.7
0.3	27.0	36.0	33.6	36.3	37.7	50.1
0.4	20.8	26.4	26.1	28.7	30.1	41.3
0.5	14.4	17.1	18.8	19.0	22.7	31.0
0.6	8.5	-	-	10.3	13.8	19.1
0.7	3.2	-	-	5.3	7.9	9.9

单位级、非参数化的时间偏移在边界回归方面优于参数化和帧级方法。
级联边界回归相比非级联基线在 proposal AR@F=1.0 与 detection mAP@IoU=0.5 上有所提升，最佳结果出现在中等级联深度（提议阶段 K_p=3，检测阶段 K_d=2）。
CBR 与 two-stream 特征在 THUMOS-14 上实现了 AR@F=1.0 与 mAP@tIoU=0.5 的最先进水平，在高 IoU 阈值下显著优于先前方法。
在 THUMOS-14 上，CBR-C3D 与 CBR-TS 在多种指标下优于 SCNN-prop 和 TURN，其中 CBR-TS 在检测上达到 31.0% mAP@tIoU=0.5。
在 TVSeries 上，级联回归相对于无回归基线带来显著提升，CBR-TS 在若干 tIoU 设置中优于此前的 FV 与 SVM-TS 方法。
结果表明 CBR 对提议生成和动作检测在多数据集上均具有强大有效性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。