QUICK REVIEW

[論文レビュー] Cascaded Boundary Regression for Temporal Action Detection

Jiyang Gao, Zhenheng Yang|arXiv (Cornell University)|May 2, 2017

Human Pose and Action Recognition参考文献 16被引用数 58

ひとこと要約

論文は Cascaded Boundary Regression (CBR) を two-stage temporal action detection パイプライン内に導入し、境界を反復的に refine して、THUMOS-14 と TVSeries で特に高い IoU 阈値で最先端の結果を達成する。

ABSTRACT

Temporal action detection in long videos is an important problem. State-of-the-art methods address this problem by applying action classifiers on sliding windows. Although sliding windows may contain an identifiable portion of the actions, they may not necessarily cover the entire action instance, which would lead to inferior performance. We adapt a two-stage temporal action detection pipeline with Cascaded Boundary Regression (CBR) model. Class-agnostic proposals and specific actions are detected respectively in the first and the second stage. CBR uses temporal coordinate regression to refine the temporal boundaries of the sliding windows. The salient aspect of the refinement process is that, inside each stage, the temporal boundaries are adjusted in a cascaded way by feeding the refined windows back to the system for further boundary refinement. We test CBR on THUMOS-14 and TVSeries, and achieve state-of-the-art performance on both datasets. The performance gain is especially remarkable under high IoU thresholds, e.g. map@tIoU=0.5 on THUMOS-14 is improved from 19.0% to 31.0%.

研究の動機と目的

未 trimmed 動画におけるスライディングウィンドウのカバー範囲を超えた正確な時系列局在化を動機づける。
各段内で時系列境界を段階的に refine する cascaded boundary regression メカニズムを提案する。
CBR の temporal action proposal generation および action detection の両方での有効性を示す。
THUMOS-14 および TVSeries データセットに対して、従来手法と比較してパフォーマンスを評価する。

提案手法

2 段階のアクション検出パイプライン: ステージ 1 はクラス非依存の時間的提案を生成する。ステージ 2 は提案に基づくアクション特異的検出を行う。
C3D および context-augmented clip 表現を用いた2ストリーム CNN 特徴量と共に、ユニットレベルの映像特徴抽出を行う。
開始/終了境界を refine するために非パラメトリックなユニットレベルのオフセットを用いる時系列座標回帰。
各ステージ内での cascaded boundary regression: 提案の K_p ステップ、検出の K_d ステップとして、 refined クリップを同じネットワークへ再入力してさらに境界を refine する。
分類（提案は二値、検出は多クラス）と L1 ベースの境界回帰を組み合わせたマルチタスク損失を、指定されたハイパーパラメータで Adam により最適化する。
訓練サンプルは tIoU ベースのラベリングを伴うスライディングウィンドウから取得され、提案ネットワークと検出ネットワークを個別に学習可能とする。

実験結果

リサーチクエスチョン

RQ1非パラメトリックなユニットレベル時系列座標回帰は、境界の refinement に対してパラメトリックおよびフレームレベルのオフセットより優れているか。
RQ2 cascaded boundary regression のステップは、単一ステップ回帰と比較して境界局在化とアクション検出性能を改善するか。
RQ3THUMOS-14 および TVSeries における temporal proposal generation と action detection の双方で、CBR は従来法と比較してどの程度の性能差を示すか。
RQ4異なる特徴量タイプ（C3D vs. two-stream）を用いた場合、局在精度に与える影響はどの程度か。

主な発見

tIoU	Oneata et al. 2014	Yeung et al. 2016	Yuan et al. 2016	S-CNN 2016	CBR-C3D	CBR-TS
0.1	36.6	48.9	51.4	47.7	48.2	60.1
0.2	33.6	44.0	42.6	43.5	44.3	56.7
0.3	27.0	36.0	33.6	36.3	37.7	50.1
0.4	20.8	26.4	26.1	28.7	30.1	41.3
0.5	14.4	17.1	18.8	19.0	22.7	31.0
0.6	8.5	-	-	10.3	13.8	19.1
0.7	3.2	-	-	5.3	7.9	9.9

ユニットレベルの非パラメトリックな時系列オフセットは、パラメトリックおよびフレームレベルのアプローチより境界回帰で優れている。
cascaded boundary regression は proposal AR@F=1.0 および detection mAP@IoU=0.5 を non-cascaded ベースラインより改善し、最良結果は中間の cascade depths（例：提案で K_p=3、検出で K_d=2）で得られる。
CBR with two-stream features は THUMOS-14 における AR@F=1.0 および mAP@tIoU=0.5 で最先端を達成し、高い IoU 阈値で従来手法を著しく上回る。
THUMOS-14 では CBR-C3D および CBR-TS が SCNN-prop および TURN を様々な指標で上回り、検出の際には CBR-TS が 31.0% mAP@tIoU=0.5 を達成。
TVSeries では cascaded regression は non-regression ベースラインより大幅な利得を生み、CBR-TS はいくつかの tIoU 設定で従来の FV および SVM-TS アプローチを上回る。
結果は、挑戦的なデータセット全体で提案生成とアクション検出の両方に対する CBR の高い有効性を示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。