[论文解读] Robustness of Object Recognition under Extreme Occlusion in Humans and Computational Models
本文显示人类在极端现实世界遮挡下具有高度鲁棒性,而卷积神经网络落后;一个两阶段的组合模型在极端遮挡下实现更接近人类的鲁棒性。
Most objects in the visual world are partially occluded, but humans can recognize them without difficulty. However, it remains unknown whether object recognition models like convolutional neural networks (CNNs) can handle real-world occlusion. It is also a question whether efforts to make these models robust to constant mask occlusion are effective for real-world occlusion. We test both humans and the above-mentioned computational models in a challenging task of object recognition under extreme occlusion, where target objects are heavily occluded by irrelevant real objects in real backgrounds. Our results show that human vision is very robust to extreme occlusion while CNNs are not, even with modifications to handle constant mask occlusion. This implies that the ability to handle constant mask occlusion does not entail robustness to real-world occlusion. As a comparison, we propose another computational model that utilizes object parts/subparts in a compositional manner to build robustness to occlusion. This performs significantly better than CNN-based models on our task with error patterns similar to humans. These findings suggest that testing under extreme occlusion can better reveal the robustness of visual recognition, and that the principle of composition can encourage such robustness.
研究动机与目标
- 评估人类和计算模型对极端现实世界遮挡的鲁棒性。
- 评估对恒定遮挡的鲁棒性是否会转移到现实遮挡场景。
- 在被遮挡的车辆图像上比较CNN、Hopfield-CNN混合模型和组合的两阶段模型。
- 研究对象部件和组成结构是否能提高遮挡鲁棒性。
提出的方法
- 在具有真实遮挡物的高度遮挡的车辆图像数据集上收集人类表现。
- 评估对遮挡进行了适配的CNN(AlexNet、ResNet、VGG16)。
- 测试在fc7特征上训练的CNN+Hopfield混合模型。
- 提出并评估一个使用部件检测和空间投票配合空间金字塔池化的两阶段组合模型。
- 使用类别层面的混淆矩阵和表征不相似性矩阵来比较人类与模型的表征。
实验结果
研究问题
- RQ1当遮挡物是复杂背景中的真实对象时,人类是否能识别高度遮挡的现实世界对象?
- RQ2CNN和混合模型在极端遮挡下是否表现出与人类相近的鲁棒性?
- RQ3利用对象部件和组合结构是否能提高遮挡鲁棒性?
- RQ4对恒定遮挡的鲁棒性是否能预测对现实遮挡的鲁棒性?
- RQ5在极端遮挡下,不同模型的错误模式与人类错误模式有什么差异?
主要发现
- 在人类在极端遮挡下仍表现出较高的识别准确性,显示出强鲁棒性。
- CNN在没有遮挡时表现良好,但在极端遮挡下鲁棒性较差。
- Hopfield增强的CNN在恒定遮挡时提高了性能,但未能改善极端遮挡鲁棒性。
- 一个使用部件检测和空间投票的两阶段组合模型在极端遮挡下达到67.0%的准确率,在该设置中超过了CNNs和混合模型。
- 该组合模型呈现与人类相似的错误模式,并在类别层面与人类混淆矩阵和图像级RDM的相关性高于其他模型。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。