QUICK REVIEW

[论文解读] RiFCN: Recurrent Network in Fully Convolutional Network for Semantic Segmentation of High Resolution Remote Sensing Images

Lichao Mou, Xiao Xiang Zhu|arXiv (Cornell University)|May 5, 2018

Advanced Neural Network Applications参考文献 40被引用 66

一句话总结

RiFCN 引入一个双向、循环融合的特征网络，将多尺度 CNN 特征结合起来，以提高高分辨率遥感影像的像素级语义分割，在 ISPRS Potsdam 和 Inria 数据集上优于 FCN 和 SegNet 的指标。

ABSTRACT

Semantic segmentation in high resolution remote sensing images is a fundamental and challenging task. Convolutional neural networks (CNNs), such as fully convolutional network (FCN) and SegNet, have shown outstanding performance in many segmentation tasks. One key pillar of these successes is mining useful information from features in convolutional layers for producing high resolution segmentation maps. For example, FCN nonlinearly combines high-level features extracted from last convolutional layers; whereas SegNet utilizes a deconvolutional network which takes as input only coarse, high-level feature maps of the last convolutional layer. However, how to better fuse multi-level convolutional feature maps for semantic segmentation of remote sensing images is underexplored. In this work, we propose a novel bidirectional network called recurrent network in fully convolutional network (RiFCN), which is end-to-end trainable. It has a forward stream and a backward stream. The former is a classification CNN architecture for feature extraction, which takes an input image and produces multi-level convolutional feature maps from shallow to deep; while in the later, to achieve accurate boundary inference and semantic segmentation, boundary-aware high resolution feature maps in shallower layers and high-level but low-resolution features are recursively embedded into the learning framework (from deep to shallow) to generate a fused feature representation that draws a holistic picture of not only high-level semantic information but also low-level fine-grained details. Experimental results on two widely-used high resolution remote sensing data sets for semantic segmentation tasks, ISPRS Potsdam and Inria Aerial Image Labeling Data Set, demonstrate competitive performance obtained by the proposed methodology compared to other studied approaches.

研究动机与目标

促使对多尺度 CNN 特征的改进融合，以便在高分辨率遥感影像中实现更精确的边界界定。
提出具有前向特征提取器和向后循环融合流的双向 RiFCN 架构。
实现整个网络的端到端训练，以提升像素级语义分割性能。

提出的方法

前向流：一个5层的 CNN（VGG-16 风格），通过3x3卷积和2x2最大池化输出多尺度特征图，利用填充和 ReLU 激活保持分辨率。
向后流：一个循环的自回归融合过程，使用可变形样的反卷积进行向上采样与融合，将高级特征自上而下地整合到较浅层（Φ 函数）。
基于方程的融合：F_bwd^l = Φ(F_fwd^l, F_bwd^{l+1})，Φ 结合前向路径卷积项与反卷积项；反向传播梯度遵循多层累积（式6）和动量更新（式7）。
损失：对 M 类进行像素级交叉熵损失，条件为前向和向后流参数（W、W_fwd、W_bwd）。
训练：在 TensorFlow 中端到端学习，使用 Nesterov Adam、较小的批量、数据增强、早停和 30 轮训练。

实验结果

研究问题

RQ1一个通过循环向后流融合所有层级特征的双向网络，是否能在高分辨率遥感图像中改善边界保留的语义分割？
RQ2自回归、自顶向下的特征融合是否有助于在保持高层语义准确性的同时保留细粒度细节？
RQ3RiFCN 相对于标准 FCN 和 SegNet 在基准高分辨率遥感数据集上的表现如何？
RQ4在使用腐蚀边界真值进行评估时，该方法是否具有鲁棒性？
RQ5在航空影像中的建筑、道路和小目标类别上，定性与定量的增益如何？

主要发现

RiFCN 在 ISPRS Potsdam 数据集的 mean F1 和总体精度上优于 FCN 和 SegNet（RiFCN: 83.70 OA；RiFCN[e]: 86.05 OA，相较 FCN 80.76 和 SegNet 80.64）。
RiFCN 在包括小目标如汽车在内的各类别上取得更高的分数（RiFCN: 汽车均值 88.91；RiFCN[e]: 93.73）。
相较于 FCN 和 SegNet，RiFCN 与 RiFCN[e] 在大多数地物类别（不透水层、建筑、低覆盖植被、树木、汽车、杂乱背景）上呈现持续改进。
在 Inria 航空影像标注数据集上，RiFCN 实现了具竞争力的 IoU 和高于若干基线（包括 SegNet 和 FCN 变体）的总体精度（RiFCN IoU/Acc: 74.00/95.82 总体，RiFCN[e] 结果待补充）。
向后循环融合实现了从深层到浅层的多路径信息流，提升边界轮廓和语义一致性，如定性结果所示。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。