QUICK REVIEW

[论文解读] DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model

Eldar Insafutdinov, Leonid Pishchulin|arXiv (Cornell University)|May 10, 2016

Human Pose and Action Recognition参考文献 12被引用 115

一句话总结

DeeperCut 通过 1) 深层、强大的身体部位检测器；2) 图像条件的对偶项用于组装部件；以及 3) 一种增量优化策略，在显著提高推断速度的同时改善准确性。

ABSTRACT

The goal of this paper is to advance the state-of-the-art of articulated pose estimation in scenes with multiple people. To that end we contribute on three fronts. We propose (1) improved body part detectors that generate effective bottom-up proposals for body parts; (2) novel image-conditioned pairwise terms that allow to assemble the proposals into a variable number of consistent body part configurations; and (3) an incremental optimization strategy that explores the search space more efficiently thus leading both to better performance and significant speed-up factors. Evaluation is done on two single-person and two multi-person pose estimation benchmarks. The proposed approach significantly outperforms best known multi-person pose estimation results while demonstrating competitive performance on the task of single person pose estimation. Models and code available at http://pose.mpi-inf.mpg.de

研究动机与目标

通过深度学习改进身体部位检测，以生成高质量的自下而上的候选 proposals。
引入图像条件的对偶项，在拥挤场景中正确将身体部位拼接成姿态。
开发增量优化策略，在不牺牲准确性的前提下显著加速推断。
在单人和多人姿态基准上展示最先进的性能。

提出的方法

使用基于非常深的残差网络（ResNet）的部位检测器，采用全卷积架构来生成身体部位的分数图。
将 ResNet 调整为保持微粒度的 8 px 步幅，并使用反卷积/空洞来恢复部位定位的空间分辨率。
通过在 conv4 块内添加部位损失层来引入中间监督，以改善梯度流动和空间消歧。
训练一个图像条件的对偶项模型，该模型从每个部位位置回归其他关节的相对位置，生成特征以通过逻辑模型 p(z=1|f, ω) 计算成对成本。
通过将 CNN 预测的偏移与实际的部位间偏移进行对比来计算成对成本，包含正向和反向方向以及角度项。
通过一个增量的分支与割 ILP 求解器对整体身体部位选择和聚类为不同个体进行优化， sequentially 解决多个更小的实例。

实验结果

研究问题

RQ1深层部位检测器如何影响单人和多人姿态估计的性能？
RQ2图像条件的对偶项是否能够在拥挤场景中改善身体部位假设的分组，形成一致的多人姿态？
RQ3增量优化策略是否可在多人设置中降低运行时间，同时保持或提高姿态准确性？

主要发现

基于非常深的 ResNet 的部位检测器在 LSP 和 MPII 基准上达到最先进的 PCK/AUC，且中间监督提供额外提升。
图像条件的对偶项显著提高多人姿态 AP，并显著降低运行时间（例如在一个比较中从 259,220 s/frame 降至 1,987 s/frame）。
双向对偶项配合角度特征在消融研究中获得最佳 AP（52.6% AP）且运行时间最低（578 s/frame）。
增量优化（3 阶段）将 AP 提升至 57.6% 并将中位运行时间降至 271 s/frame，相较于单阶段基线。
DeeperCut 超越基线 DeepCut 和强的两阶段基线，同时实现数量级的运行时间降低。
在 MPII 多人数据集上，DeeperCut 通过增量优化在子集数据上达到 69.7% AP，完整数据集上达到 59.4% AP，且显著降低运行时间。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。