QUICK REVIEW

[论文解读] Convolutional Pose Machines

Evan Murray, von Coler, Henrik|arXiv (Cornell University)|Jan 30, 2016

Human Pose and Action Recognition参考文献 37被引用 295

一句话总结

Convolutional Pose Machines (CPMs) integrate deep convolutional networks into a sequential pose estimation framework to learn image features and implicit spatial models, producing progressively refined belief maps for body parts without explicit graphical-model inference. They achieve state-of-the-art results on MPII, LSP, and FLIC benchmarks.

ABSTRACT

Pose Machines provide a sequential prediction framework for learning rich implicit spatial models. In this work we show a systematic design for how convolutional networks can be incorporated into the pose machine framework for learning image features and image-dependent spatial models for the task of pose estimation. The contribution of this paper is to implicitly model long-range dependencies between variables in structured prediction tasks such as articulated pose estimation. We achieve this by designing a sequential architecture composed of convolutional networks that directly operate on belief maps from previous stages, producing increasingly refined estimates for part locations, without the need for explicit graphical model-style inference. Our approach addresses the characteristic difficulty of vanishing gradients during training by providing a natural learning objective function that enforces intermediate supervision, thereby replenishing back-propagated gradients and conditioning the learning procedure. We demonstrate state-of-the-art performance and outperform competing methods on standard benchmarks including the MPII, LSP, and FLIC datasets.

研究动机与目标

激励并设计一个系统，用于在关节姿态估计中学习隐式的长程空间依赖。
用可微分、端到端可训练的卷积架构替代手工特征和图形模型推理。
通过在每个阶段使用中间监督来解决深层序列网络中的梯度消失问题。
在标准姿态基准上展示最先进的准确性，并分析 CPM 的训练方案。

提出的方法

用多阶段卷积网络替代姿态机预测器，在每个阶段预测每个部位的信念映射。
每个阶段利用图像证据和前一阶段的信念映射来生成细化的信念映射，使图像和信念映射都具有较大的感受野。
通过在每个阶段的信念映射上增加一个 L2 损失来实现中间监督，端到端地训练以对抗梯度消失。
跨阶段共享图像特征映射，逐步扩大感受野以捕捉长程部位关系。
在 MPII、LSP 和 FLIC 上进行评估，使用数据增强和多尺度信念映射融合来获得最终预测。

实验结果

研究问题

RQ1一个完全可微的多阶段卷积架构是否可以在不进行图形模型推理的情况下学习姿态估计的隐式空间模型？
RQ2中间监督是否能有效缓解深层、分阶段 CNN 的梯度消失问题？
RQ3后期阶段增大感受野如何影响对长程部位依赖的准确性？
RQ4在 CPM 中，端到端联合训练相对于分阶段或非监督训练的相对收益是什么？
RQ5CPMs 是否在 MPII、LSP、和 FLIC 数据集上在高精度和低精度指标上都达到最先进的性能？

主要发现

CPM 在 MPII、LSP 和 FLIC 数据集上取得了最先进的结果。
中间监督缓解梯度消失并改善了多层学习。
后期阶段更大的感受野带来更好的长程部位交互建模与准确性提升。
端到端训练、联合优化和中间损失显著优于分阶段或非监督训练。
在实验中，阶段数最多五阶段时性能提升显著，第六阶段收益递减。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。