QUICK REVIEW

[论文解读] Learning Feature Pyramids for Human Pose Estimation

Wei Yang, Shuang Li|arXiv (Cornell University)|Aug 3, 2017

Human Pose and Action Recognition参考文献 54被引用 62

一句话总结

引入 Pyramid Residual Modules (PRMs) 在 DCNNs 学习特征金字塔，在 MPII 和 LSP 基准上实现最先进的姿态估计，同时提供多分支网络的理论初始化和残差方差控制。

ABSTRACT

Articulated human pose estimation is a fundamental yet challenging task in computer vision. The difficulty is particularly pronounced in scale variations of human body parts when camera view changes or severe foreshortening happens. Although pyramid methods are widely used to handle scale changes at inference time, learning feature pyramids in deep convolutional neural networks (DCNNs) is still not well explored. In this work, we design a Pyramid Residual Module (PRMs) to enhance the invariance in scales of DCNNs. Given input features, the PRMs learn convolutional filters on various scales of input features, which are obtained with different subsampling ratios in a multi-branch network. Moreover, we observe that it is inappropriate to adopt existing methods to initialize the weights of multi-branch networks, which achieve superior performance than plain networks in many tasks recently. Therefore, we provide theoretic derivation to extend the current weight initialization scheme to multi-branch network structures. We investigate our method on two standard benchmarks for human pose estimation. Our approach obtains state-of-the-art results on both benchmarks. Code is available at https://github.com/bearpaw/PyraNet.

研究动机与目标

在关节人类姿态估计中解决尺度变化和透视缩短（foreshortening）问题。
提出 Pyramid Residual Module 在 DCNNs 中学习多尺度特征金字塔。
提供面向多分支网络的带理论支撑的初始化方案。
减轻 Hourglass/ResNet-like 架构中激活方差的增长。
在 MPII 和 LSP 上通过对比消融展示最先进的性能。

提出的方法

设计 Pyramid Residual Module (PRM)，通过在多分辨率处理输入特征来学习多尺度特征金字塔。
使用分数最大池化在各金字塔级别之间以受控子采样比生成输入特征金字塔。
将 PRMs 集成到堆叠的 Hourglass 网络中以替代单尺度残差单元。
扩展权重初始化以适用于多分支网络，推导基于方差的缩放以维持前向/反向传播的稳定。
通过在跳跃连接中用 1x1 卷积 + BN + ReLU 代替来识别并减小残差单元的恒等映射导致的输出方差累积。
在 MPII 和 LSP 上进行广泛实验，并对 PRM 变体、金字塔尺度和初始化进行消融测试，同时包括 CIFAR-10。

实验结果

研究问题

RQ1通过 PRMs 在 DCNNs 内学习特征金字塔是否可以提升姿态估计的尺度不变性？
RQ2为了在 PRMs 和 Hourglass 风格架构中保持前向/后向方差，多分支网络权重应如何初始化？
RQ3在残差和的和中控制激活方差是否会改善堆叠 Hourglass 网络的优化与性能？
RQ4金字塔尺度选择对 MPII 和 LSP 的姿态估计准确率有何影响？
RQ5PRMs 是否能泛化到姿态估计以外的任务（如 CIFAR-10）？

主要发现

方法	头部	肩	肘	腕	髋	膝	踝	平均值
Pishchulin et al. [41]	74.3	49.0	40.8	34.1	36.5	34.4	35.2	44.1
Tompson et al. [52]	95.8	90.3	80.5	74.3	77.6	69.7	62.8	79.6
Carreira et al. [8]	95.7	91.7	81.7	72.4	82.8	73.2	66.4	81.3
Tompson et al. [51]	96.1	91.9	83.9	77.8	80.9	72.3	64.8	82.0
Hu&Ramanan [28]	95.0	91.6	83.0	76.6	81.9	74.5	69.5	82.4
Pishchulin et al. [42]	94.1	90.2	83.4	77.3	82.6	75.7	68.6	82.4
Lifshitz et al. [35]	97.8	93.3	85.7	80.4	85.3	76.6	70.2	85.0
Gkioxari et al. [20]	96.2	93.1	86.7	82.1	85.2	81.4	74.1	86.1
Rafi et al. [43]	97.2	93.9	86.4	81.3	86.8	80.6	73.4	86.3
Insafutdinov et al. [29]	96.8	95.2	89.3	84.4	88.4	83.4	78.0	88.5
Wei et al. [55]	97.8	95.0	88.7	84.0	88.4	82.8	79.4	88.5
Bulat & Tzimiropoulos [5]	97.9	95.1	89.9	85.3	89.4	85.7	81.7	89.7
Newell et al. [39]	98.2	96.3	91.2	87.1	90.1	87.4	83.6	90.9
Ours-A	98.4	96.5	91.9	88.2	91.1	88.6	85.3	91.8
Ours-B	98.5	96.7	92.5	88.7	91.1	88.6	86.0	92.0

PRMs 相较基线提高了姿态估计的准确性，在 MPII 上达到最先进的 PCKh@0.5 为 92.0%（Ours-B）和 92.0%（Ours-B），阈值为 0.5。
在 LSP 上，PRMs 将平均 PCK@0.2 提升至 93.9%，优于此前方法。
增加金字塔尺度通常会提升性能，四到五个尺度尤有显著提升。
一个专门的多分支初始化方案在收敛速度和最终准确性上超越 Xavier 和 MSR。
方差解释显示恒等映射会放大激活方差；用 BN-ReLU-Conv 填充跳跃连接可稳定训练并提升结果。
CIFAR-10 实验表明使用 PRM 增强的 Wide ResNet 和 ResNeXt 架构具有竞争力甚至更优的性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。