QUICK REVIEW

[论文解读] BSN: Boundary Sensitive Network for Temporal Action Proposal Generation

Tianwei Lin, Xu Zhao|arXiv (Cornell University)|Jun 8, 2018

Human Pose and Action Recognition参考文献 40被引用 95

一句话总结

BSN 引入一个局部到全球的框架，先在每个位置检测出精确的时间边界和动作性，然后将边界组合成提案，并使用提案级特征进行评估，从而在较少提案的情况下实现高召回率和高精确度。

ABSTRACT

Temporal action proposal generation is an important yet challenging problem, since temporal proposals with rich action content are indispensable for analysing real-world videos with long duration and high proportion irrelevant content. This problem requires methods not only generating proposals with precise temporal boundaries, but also retrieving proposals to cover truth action instances with high recall and high overlap using relatively fewer proposals. To address these difficulties, we introduce an effective proposal generation method, named Boundary-Sensitive Network (BSN), which adopts "local to global" fashion. Locally, BSN first locates temporal boundaries with high probabilities, then directly combines these boundaries as proposals. Globally, with Boundary-Sensitive Proposal feature, BSN retrieves proposals by evaluating the confidence of whether a proposal contains an action within its region. We conduct experiments on two challenging datasets: ActivityNet-1.3 and THUMOS14, where BSN outperforms other state-of-the-art temporal action proposal generation methods with high recall and high temporal precision. Finally, further experiments demonstrate that by combining existing action classifiers, our method significantly improves the state-of-the-art temporal action detection performance.

研究动机与目标

应对长时长且包含无关内容的未裁剪视频中生成高质量时序动作提案的挑战。
开发一种具边界感知的局部到全球方法，以产生具有灵活持续时长的精确提案。
提供提案级自信度评估，以在较少候选的情况下检索高重叠的提案。
在与分类器集成时，展示提案质量和下游时序动作检测的提升。

提出的方法

三阶段 BSN 架构：时序评估以产生 start、end 和 actionness 的概率；通过组合高概率边界来生成提案；并使用边界敏感提案（BSP）特征对提案进行评估。
使用一个三层时间卷积网络，在每个时间位置输出 p_s（起始）、p_e（结束）和 p_a（动作性）。
通过在持续时间边界内配对高 p_s 和 p_e 位置来生成候选提案，然后通过在中心、起始和结束区域采样 p_a 来构建 BSP 特征。
使用多层感知机以 BSP 作为输入对每个候选提案进行评估，得到 p_conf，并将 p_conf 与边界概率融合得到最终分数 p_f。
用关于 actionness、start 和 end 的三任务损失来训练 TEM；用基于 IoU 的目标来训练 PEM，将 p_conf 回归到 gIoU；推理阶段应用 Soft-NMS 以抑制冗余。
输出最终提案为 (t_s, t_e, p_f)，为了分析可选包含 p_s 和 p_e。

实验结果

研究问题

RQ1以边界为中心的局部到全球框架是否能够在较少提案数量的情况下实现比以往方法更高的召回率？
RQ2边界概率信号（起始/结束）与动作性相结合是否能提高提案中时间边界的精确度？
RQ3在与现有分类器结合时，提案级 BSP 特征是否能够实现可靠检索和更高质量的时序动作检测？

主要发现

与多种最先进的提案方法相比，BSN 在 ActivityNet-1.3 验证集上实现了更高的 AR@AN 和 AUC。
在 THUMOS14 上，BSN+Greedy-NMS 和 BSN+Soft-NMS 在多个 AN 配置下优于以往方法，在较少提案数量时收益显著（例如 AR@50–@1000）。
BSN 在 ActivityNet-1.3 上对未见动作类别表现出较强的泛化能力，与已见类别相比仅有轻微性能下降。
消融实验中，TEM 单独就有效，PEM 提供显著提升，BSP 组件带来互补的改进。
将 BSN 提出的提案与动作分类器结合，在 ActivityNet-1.3 和 THUMOS14 上实现竞争力或更优的时序动作检测性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。