[论文解读] Cascade Feature Aggregation for Human Pose Estimation
论文提出 Cascade Feature Aggregation (CFA),通过级联多个 hourglass 网络并从不同阶段聚合特征,以提高姿态变化、遮挡和低分辨率下的人体姿态估计的鲁棒性。
Human pose estimation plays an important role in many computer vision tasks and has been studied for many decades. However, due to complex appearance variations from poses, illuminations, occlusions and low resolutions, it still remains a challenging problem. Taking the advantage of high-level semantic information from deep convolutional neural networks is an effective way to improve the accuracy of human pose estimation. In this paper, we propose a novel Cascade Feature Aggregation (CFA) method, which cascades several hourglass networks for robust human pose estimation. Features from different stages are aggregated to obtain abundant contextual information, leading to robustness to poses, partial occlusions and low resolution. Moreover, results from different stages are fused to further improve the localization accuracy. The extensive experiments on MPII datasets and LIP datasets demonstrate that our proposed CFA outperforms the state-of-the-art and achieves the best performance on the state-of-the-art benchmark MPII.
研究动机与目标
- 由于外观变异、遮挡和低分辨率带来的人体姿态估计挑战。
- 利用深度 CNN 的高层语义信息以提升准确性。
- 开发一个级联系统,聚合来自多个阶段的特征以获得更丰富的上下文。
- 融合来自不同阶段的结果以提升定位精度。
提出的方法
- 通过级联若干 hourglass 网络引入 Cascade Feature Aggregation (CFA)。
- 聚合来自不同阶段的特征以获得丰富的上下文信息。
- 跨阶段融合结果以提升姿态定位精度。
- 利用深层卷积网络在具有挑战性的条件下提供鲁棒的特征表示。
实验结果
研究问题
- RQ1在遮挡和低分辨率下,级联 hourglass 网络并进行多阶段特征聚合是否能提升姿态估计的准确性?
- RQ2结合来自多个深度的特征是否能增强对姿态变化的鲁棒性?
- RQ3CFA 与 MPII 和 LIP 数据集的最先进方法相比如何?
主要发现
- CFA 在 MPII 基准测试中优于最先进方法。
- 该方法通过上下文特征聚合展现对姿态、部分遮挡和低分辨率的鲁棒性。
- 不同阶段结果的融合进一步提高定位精度。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。