QUICK REVIEW

[论文解读] Dual Local-Global Contextual Pathways for Recognition in Aerial Imagery

Alina Marcu, Marius Leordeanu|arXiv (Cornell University)|May 18, 2016

Video Surveillance and Tracking Methods参考文献 2被引用 24

一句话总结

本文提出了一种双流深度卷积神经网络（LG-Seg），通过联合学习局部物体外观与全局场景上下文，实现航空影像中的语义分割。通过结合用于局部特征的VGG-Net与用于全局上下文的改进型AlexNet，该模型在马萨诸塞州建筑数据集上实现了最先进性能，表明互补的局部与全局推理能显著提升在遮挡和低分辨率等挑战性条件下的识别效果。

ABSTRACT

Visual context is important in object recognition and it is still an open problem in computer vision. Along with the advent of deep convolutional neural networks (CNN), using contextual information with such systems starts to receive attention in the literature. At the same time, aerial imagery is gaining momentum. While advances in deep learning make good progress in aerial image analysis, this problem still poses many great challenges. Aerial images are often taken under poor lighting conditions and contain low resolution objects, many times occluded by trees or taller buildings. In this domain, in particular, visual context could be of great help, but there are still very few papers that consider context in aerial image understanding. Here we introduce context as a complementary way of recognizing objects. We propose a dual-stream deep neural network model that processes information along two independent pathways, one for local and another for global visual reasoning. The two are later combined in the final layers of processing. Our model learns to combine local object appearance as well as information from the larger scene at the same time and in a complementary way, such that together they form a powerful classifier. We test our dual-stream network on the task of segmentation of buildings and roads in aerial images and obtain state-of-the-art results on the Massachusetts Buildings Dataset. We also introduce two new datasets, for buildings and road segmentation, respectively, and study the relative importance of local appearance vs. the larger scene, as well as their performance in combination. While our local-global model could also be useful in general recognition tasks, we clearly demonstrate the effectiveness of visual context in conjunction with deep nets for aerial image understanding.

研究动机与目标

通过将视觉上下文整合到深度学习模型中，提升航空影像中的语义分割性能。
探究当局部特征因分辨率低、遮挡或光照差而模糊时，全局场景上下文是否能提升识别准确率。
设计一种双流架构，无需显式监督即可学习局部与全局视觉上下文的互补表征。
在真实世界航空数据集（包括建筑与道路的新基准）上，证明联合局部-全局推理的有效性。

提出的方法

该模型采用两条并行路径：一条基于微调后的VGG-Net，用于从图像小块中提取局部高分辨率特征。
第二条路径使用改进的AlexNet处理更大的全局图像区域，捕捉场景上下文信息。
两条路径的特征在最后的全连接层进行拼接，实现联合推理并解决冲突。
网络通过像素级标注的语义分割联合损失函数进行端到端训练。
通过在推理时屏蔽其中一条路径（使用平均空白图像），开展消融研究，以隔离每条路径的贡献。
该架构在马萨诸塞州建筑数据集及两个新引入的建筑与道路分割数据集上进行了评估。

实验结果

研究问题

RQ1当局部特征因遮挡或低分辨率而退化时，全局视觉上下文是否能显著提升航空影像语义分割的准确率？
RQ2局部与全局路径在最终分割输出中的贡献有何不同？它们的角色是否能通过联合训练自动涌现？
RQ3将局部外观与全局场景上下文结合，是否能优于仅依赖局部特征的模型？
RQ4在不同航空影像场景中，局部外观与全局上下文的相对重要性如何？

主要发现

所提出的LG-Seg模型在马萨诸塞州建筑数据集上实现了最先进性能，优于仅依赖局部外观的现有方法。
当仅激活局部路径时，模型能生成个体建筑物的锐利、细节丰富的分割结果，表明其具备强大的局部特征学习能力。
当仅激活全局路径时，模型生成了柔和且连贯的住宅区域分割，与专用住宅区域分类器的结果高度相似。
两条路径在训练过程中自动学习到互补角色——局部路径负责细节，全局路径负责场景级一致性，无需显式监督。
模型对遮挡和低照度具有鲁棒性，全局路径能有效抑制低密度区域中局部特征的幻觉现象。
消融研究证实，局部与全局特征的结合相比单一路径，能取得更优性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。