QUICK REVIEW

[论文解读] Multispectral Deep Neural Networks for Pedestrian Detection

Jingjing Liu, Shaoting Zhang|arXiv (Cornell University)|Nov 8, 2016

Advanced Neural Network Applications参考文献 29被引用 53

一句话总结

本文提出四种多光谱卷积神经网络融合架构，通过在深度神经网络中融合彩色与热成像图，实现更优的行人检测性能。其中，Halfway Fusion 模型在中间卷积层进行特征融合，于 KAIST 基准测试中取得最先进性能，缺失率仅为 36.99%，较基线 Faster R-CNN 降低 11%，较其他融合架构降低 3.5%。

ABSTRACT

Multispectral pedestrian detection is essential for around-the-clock applications, e.g., surveillance and autonomous driving. We deeply analyze Faster R-CNN for multispectral pedestrian detection task and then model it into a convolutional network (ConvNet) fusion problem. Further, we discover that ConvNet-based pedestrian detectors trained by color or thermal images separately provide complementary information in discriminating human instances. Thus there is a large potential to improve pedestrian detection by using color and thermal images in DNNs simultaneously. We carefully design four ConvNet fusion architectures that integrate two-branch ConvNets on different DNNs stages, all of which yield better performance compared with the baseline detector. Our experimental results on KAIST pedestrian benchmark show that the Halfway Fusion model that performs fusion on the middle-level convolutional features outperforms the baseline method by 11% and yields a missing rate 3.5% lower than the other proposed architectures.

研究动机与目标

解决单模态行人检测器在低光照或夜间条件下性能受限的问题。
探索如何在深度神经网络中有效融合多光谱（彩色与热成像）数据以提升检测性能。
研究融合时机（早期、中期、晚期或置信度级别）对检测性能的影响。
设计并评估多种卷积神经网络融合架构，以确定多光谱行人检测的最佳融合策略。
在全天候应用的 KAIST 多光谱行人检测基准上实现最先进性能。

提出的方法

将 Faster R-CNN 改造成通用卷积神经网络，分别在彩色与热成像图上训练独立的检测器。
设计四种不同的融合架构：早期融合（低层次特征）、Halfway Fusion（中层次特征）、晚期融合（高层次特征）以及置信度融合（置信度分数）。
在卷积神经网络的不同阶段执行特征融合，以评估融合时机对检测性能的影响。
使用标准指标（如缺失率 MR 和召回率）在 KAIST 多光谱行人检测数据集上训练并评估所有融合模型。
利用区域建议网络（RPN）评估建议质量，测量在不同建议数量和 IoU 阈值下的召回率。
将所有模型与基线 Faster R-CNN 及 ACF-C-T 检测器进行对比，以验证性能提升。

实验结果

研究问题

RQ1在深度神经网络的不同阶段融合彩色与热成像图，如何影响行人检测性能？
RQ2在多光谱行人检测中，卷积神经网络的中层特征融合是否能实现优于早期或晚期融合的检测协同效应？
RQ3将彩色与热成像图的互补信息相结合，是否能显著降低缺失率，相比单模态检测器？
RQ4在建议数量较少时，多光谱融合在 RPN 中对建议质量的提升程度如何，以召回率衡量？
RQ5哪种融合策略在真实世界行人检测中，于不同光照与环境条件下均表现出最强的鲁棒性？

主要发现

Halfway Fusion 模型在中间卷积层进行特征融合，在 KAIST 基准测试中取得最低整体缺失率 36.99%。
与基线 Faster R-CNN 相比，Halfway Fusion 将缺失率降低了 11%，证明多光谱融合可带来显著性能提升。
仅使用 50 个建议时，该模型即可实现 94% 的召回率，优于 Faster R-CNN-C 和 Faster R-CNN-T，后者需约 80 个建议才能达到相似召回率。
在 300 个建议时，Halfway Fusion 在 IoU 0.6 下实现 93.9% 的召回率，优于其他模型，表明其建议质量更高，与真实标注重叠更好。
与次优的融合架构相比，Halfway Fusion 将缺失率进一步降低了 3.5%，证实其在多光谱协同方面的优越性。
分别独立训练的彩色与热成像检测器提供互补的检测决策，验证了多光谱融合在实现全天候鲁棒行人检测方面的潜力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。