QUICK REVIEW

[论文解读] Am I a Baller? Basketball Skill Assessment using First-Person Cameras.

Gedas Bertasius, Stella X. Yu|arXiv (Cornell University)|Nov 16, 2016

Human Pose and Action Recognition参考文献 8被引用 3

一句话总结

本文提出一种基于第一人称视频的方法，通过从带标签的视频对中学习评价者特定偏好，评估篮球运动员的表现。利用卷积LSTM检测原子事件，并使用高斯混合模型编码非线性时空特征，该模型能够在不了解评价者标准的前提下准确预测球员排名，同时识别影响表现的关键事件。

ABSTRACT

This paper presents a method to assess a basketball player's performance from his/her first-person video. A key challenge lies in the fact that the evaluation metric is highly subjective and specific to a particular evaluator. We leverage the first-person camera to address this challenge. The spatiotemporal visual semantics provided by a first-person view allows us to reason about the camera wearer's actions while he/she is participating in an unscripted basketball game. Our method takes a player's first-person video and provides a player's performance measure that is specific to an evaluator's preference. To achieve this goal, we first use a convolutional LSTM network to detect atomic basketball events from first-person videos. Our network's ability to zoom-in to the salient regions addresses the issue of a severe camera wearer's head movement in first-person videos. The detected atomic events are then passed through the Gaussian mixtures to construct a highly non-linear visual spatiotemporal basketball assessment feature. Finally, we use this feature to learn a basketball assessment model from pairs of labeled first-person basketball videos, for which a basketball expert indicates, which of the two players is better. We demonstrate that despite not knowing the basketball evaluator's criterion, our model learns to accurately assess the players in real-world games. Furthermore, our model can also discover basketball events that contribute positively and negatively to a player's performance.

研究动机与目标

解决现实世界非剧本化比赛中篮球表现评估的主观性和评价者特定性问题。
利用第一人称视频提取反映球员在比赛中动作与决策的时空视觉语义。
从篮球专家标注的视频对中学习个性化表现评估模型。
识别哪些特定篮球事件对球员整体表现得分产生积极或消极影响。

提出的方法

使用卷积LSTM网络从第一人称视频帧中检测原子篮球事件（例如运球、投篮、传球）。
应用空间注意力机制聚焦显著区域，减轻第一人称视频中因头部运动引起的失真。
使用高斯混合模型对检测到的事件时间动态进行建模，生成非线性、高维的视觉时空特征。
使用专家标注的视频对（标明哪位球员表现更好）训练表现评估模型。
学习与评价者标准对齐的偏好感知表征，而无需显式指定评估指标。
利用训练好的模型推断表现得分，并解释事件层面贡献对整体评估的影响。

实验结果

研究问题

RQ1仅使用第一人称视频和专家标注的对比结果，深度学习模型能否在不了解评价者具体标准的情况下准确评估篮球表现？
RQ2该模型在未经剧本化的第一人称比赛中，检测和定位有意义篮球事件的能力如何？
RQ3该模型在多大程度上能够识别出对球员表现得分产生正向或负向影响的具体在场动作？
RQ4该模型能否泛化到具有可变摄像机运动和复杂视觉场景的真实篮球比赛？

主要发现

该模型在仅基于第一人称视频对和专家标注结果的情况下，能够高精度预测哪位球员表现更优，即使未明确知晓评价者的偏好度量标准。
尽管存在显著的摄像机运动，该模型仍能成功检测并定位第一人称视频中的原子篮球事件。
基于高斯混合模型的特征编码能有效捕捉与表现评估相关的复杂非线性时空模式。
该模型能够识别出如投篮不中或传球失误等特定事件，这些事件对球员得分产生负面影响，从而提供可解释的反馈。
该方法在真实世界非剧本化比赛中表现出良好的泛化能力，对视觉噪声和动态摄像机运动具有鲁棒性。
通过仅依赖成对比较的弱监督，表现评估模型成功学习到与专家判断对齐的能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。