QUICK REVIEW

[论文解读] Detecting Hands and Recognizing Physical Contact in the Wild

Supreeth Narasimhaswamy, Trung Nguyen|arXiv (Cornell University)|Jan 1, 2020

Hand Gesture Recognition Systems被引用 3

一句话总结

本文提出了一种基于Mask-RCNN的新型网络，结合双注意力机制，以在非约束图像中联合检测双手并识别其物理接触状态。通过利用目标检测器的输出和空间注意力特征池化，该模型在新数据集ContactHands上相较于基线Mask-RCNN实现了7%的相对性能提升，该数据集为真实场景中的手部及接触状态提供了标注。

ABSTRACT

We investigate a new problem of detecting hands and recognizing their physical contact state in unconstrained conditions. This is a challenging inference task given the need to reason beyond the local appearance of hands. The lack of training annotations indicating which object or parts of an object the hand is in contact with further complicates the task. We propose a novel convolutional network based on Mask-RCNN that can jointly learn to localize hands and predict their physical contact to address this problem. The network uses outputs from another object detector to obtain locations of objects present in the scene. It uses these outputs and hand locations to recognize the hand's contact state using two attention mechanisms. The first attention mechanism is based on the hand and a region's affinity, enclosing the hand and the object, and densely pools features from this region to the hand region. The second attention module adaptively selects salient features from this plausible region of contact. To develop and evaluate our method's performance, we introduce a large-scale dataset called ContactHands, containing unconstrained images annotated with hand locations and contact states. The proposed network, including the parameters of attention modules, is end-to-end trainable. This network achieves approximately 7\% relative improvement over a baseline network that was built on the vanilla Mask-RCNN architecture and trained for recognizing hand contact states.

研究动机与目标

为解决在非约束的真实世界图像中检测手部并识别其物理接触状态的挑战，仅依靠局部手部外观特征不足以实现准确推理。
克服缺乏标注训练数据以指示特定手-物体接触点或区域的问题，这阻碍了接触推理的监督学习。
开发一种统一的深度学习框架，通过注意力机制建模手部与物体之间的空间和特征级关系，联合定位手部并预测其接触状态。
构建一个大规模的真实世界数据集ContactHands，以支持在非约束环境下手部接触识别模型的训练与评估。

提出的方法

在Mask-RCNN基础上引入两种新型注意力机制：一种基于手部与物体之间的区域亲和性，将联合手-物体区域的特征密集池化至手部区域。
采用第二种注意力模块，自适应地从池化区域中选择显著特征，以优化接触状态预测。
利用独立目标检测器的输出提供场景中的物体位置，用于定义注意力计算的合理接触区域。
设计一个端到端可训练的网络，其中所有组件（包括注意力模块）在训练过程中联合优化。
在注意力模块中应用特征提取和RoIAlign区域池化，以保留空间分辨率并提高定位精度。
在ContactHands数据集上训练模型，该数据集包含手部和物体的边界框，以及接触状态标注。

实验结果

研究问题

RQ1深度学习模型能否在非约束的真实世界图像中联合检测手部并识别其物理接触状态？
RQ2注意力机制在建模手部与物体之间空间和特征级关系以用于接触识别方面有多有效？
RQ3结合目标检测器输出在多大程度上提升了手部接触状态预测的准确性？
RQ4与标准的Mask-RCNN基线模型相比，所提出的模型在接触识别性能方面表现如何？

主要发现

所提出的模型在手部接触状态识别任务上相较于基线Mask-RCNN模型实现了7%的相对性能提升。
双注意力机制通过聚焦于手部与物体之间交互的显著区域，显著增强了接触推理的特征表示能力。
ContactHands数据集为评估非约束真实场景下的手部检测与接触识别提供了宝贵的基准。
整合目标检测器输出提升了模型在定位和推理合理接触区域方面的能力。
端到端训练方案实现了对注意力模块及整体检测与识别流程的有效优化。
由于基于注意力的特征优化，该模型在存在遮挡和多样化手-物体交互的复杂场景中表现出鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。