QUICK REVIEW

[论文解读] Depth-Enhanced Feature Pyramid Network for Occlusion-Aware Verification of Buildings from Oblique Images

Qing Zhu, Shengzhi Huang|arXiv (Cornell University)|Nov 26, 2020

Remote Sensing and LiDAR Applications参考文献 62被引用 15

一句话总结

本文提出了一种深度增强的特征金字塔网络（FFP），通过融合倾斜影像和三维网格模型的颜色与深度数据，提升遮挡感知的建筑物验证能力。通过整合多尺度特征和多视角投票机制，该方法在Zurich数据集上实现了98.1%的召回率和97.2%的准确率，相较于ResNet和EfficientNet在召回率和精确率上分别提升了5%和2%，实现了接近全自动的变更建筑物检测，仅需极少的人工审核。

ABSTRACT

Detecting the changes of buildings in urban environments is essential. Existing methods that use only nadir images suffer from severe problems of ambiguous features and occlusions between buildings and other regions. Furthermore, buildings in urban environments vary significantly in scale, which leads to performance issues when using single-scale features. To solve these issues, this paper proposes a fused feature pyramid network, which utilizes both color and depth data for the 3D verification of existing buildings 2D footprints from oblique images. First, the color data of oblique images are enriched with the depth information rendered from 3D mesh models. Second, multiscale features are fused in the feature pyramid network to convolve both the color and depth data. Finally, multi-view information from both the nadir and oblique images is used in a robust voting procedure to label changes in existing buildings. Experimental evaluations using both the ISPRS benchmark datasets and Shenzhen datasets reveal that the proposed method outperforms the ResNet and EfficientNet networks by 5\% and 2\%, respectively, in terms of recall rate and precision. We demonstrate that the proposed method can successfully detect all changed buildings; therefore, only those marked as changed need to be manually checked during the pipeline updating procedure; this significantly reduces the manual quality control requirements. Moreover, ablation studies indicate that using depth data, feature pyramid modules, and multi-view voting strategies can lead to clear and progressive improvements.

研究动机与目标

主要目标是通过利用摄影测量网格模型中的三维深度信息，减少建筑物变化检测中的误报。
为应对城市建筑物中的尺度变化问题，该方法提出一种统一的多尺度特征金字塔网络，用于屋顶和立面的检测。
为提升鲁棒性，该方法采用投票策略，融合垂直视角和倾斜视角的多视角信息。
目标是在保持低误报率的同时，实现对变更建筑物的100%真正例检测，从而最大限度减少人工质量控制工作量。

提出的方法

使用结构从运动（SfM）和多视角立体（MVS）流程重建三维网格模型。
利用已知的相机姿态，将三维网格模型渲染为对应倾斜影像视角的深度图。
将颜色图像与深度图像融合，作为改进的特征金字塔网络（FFP）的输入，该网络新增了一层融合模块以支持多尺度特征学习。
通过基于深度的遮挡检测提取可见区域，FFP将这些区域分类为屋顶、立面或背景。
采用鲁棒的多视角投票策略，整合来自多个视角（垂直和倾斜）的预测结果，以提升检测可靠性。
该方法利用数据库中提取的拉伸建筑物轮廓，并将其投影到所有可见视角，以指导区域提取和训练过程。

实验结果

研究问题

RQ1来自三维网格模型的深度增强特征是否能提升在模糊城市纹理中建筑物特征的可分性？
RQ2融合颜色与深度数据的特征金字塔网络是否优于ResNet和EfficientNet等标准CNN在建筑物验证任务中的表现？
RQ3多视角投票策略在确保100%检测变更建筑物的同时，是否能有效降低误报率？
RQ4引入深度数据与多尺度特征融合，在不同大小建筑物和复杂遮挡条件下，能在多大程度上提升整体性能？

主要发现

所提出的FFP网络在ISPRS Zurich数据集上实现了98.1%的召回率和97.2%的准确率，相较于ResNet在召回率上提升了5.7个百分点，在准确率上提升了5.0个百分点。
在Shenzhen数据集上，该方法实现了96.5%的召回率和95.4%的准确率，分别较ResNet高出6.5和7.0个百分点。
消融实验表明，仅使用深度数据即可在所有网络中使准确率提升超过1%，其中FFP网络获得最显著的性能增益。
多视角投票策略实现了100%真正例检测变更建筑物，同时保持了较低的误报率，有效解决了单视角分析带来的歧义问题。
该方法在不同大小建筑物上均表现出一致的性能，即使对于小型建筑物（<100 m²），正确率仍高于92%，表明对尺度变化具有强鲁棒性。
深度数据融合、特征金字塔融合与多视角投票的集成，带来了逐步且可量化的检测性能提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。