QUICK REVIEW

[论文解读] Learning Arbitrary Pairwise Potentials in CRFs for Semantic Segmentation

Måns Larsson, Anurag Arnab|arXiv (Cornell University)|Jan 24, 2017

Advanced Neural Network Applications参考文献 2被引用 2

一句话总结

本文为条件随机场（CRFs）在语义分割中的配对势函数提出了一种可学习的函数，通过投影梯度下降实现端到端训练。通过学习非高斯、与图像相关的势函数——如空间和双边核函数——该方法在标准高斯势函数的基础上提升了分割精度，在公开基准测试中优于以往的CNN+CRF方法。

ABSTRACT

Are we using the right potential functions in the Conditional Random Field models that are popular in the Vision community? Semantic segmentation and other pixel-level labelling tasks have made significant progress recently due to the deep learning paradigm. However, most state-of-the-art structured prediction methods also include a random field model with a hand-crafted Gaussian potential to model spatial priors, label consistencies and feature-based image conditioning. In this paper, we challenge this view by developing a new inference and learning framework which can learn pairwise CRF potentials restricted only by their dependence on the image pixel values and the size of the support. Both standard spatial and high-dimensional bilateral kernels are considered. Our framework is based on the observation that CRF inference can be achieved via projected gradient descent and consequently, can easily be integrated in deep neural networks to allow for end-to-end training. It is empirically demonstrated that such learned potentials can improve segmentation accuracy and that certain label class interactions are indeed better modelled by a non-Gaussian potential. In addition, we compare our inference method to the commonly used mean-field algorithm. Our framework is evaluated on several public benchmarks for semantic segmentation with improved performance compared to previous state-of-the-art CNN+CRF models.

研究动机与目标

挑战‘手工设计的高斯势函数在CRFs语义分割中为最优’这一假设。
开发一种可微分的推理与学习框架，使配对势函数能够依赖于图像像素值和感受野大小。
通过将CRF推理集成到投影梯度下降中，实现深度神经网络与CRF模块的端到端训练。
通过实证验证，学习到的非高斯势函数可比标准高斯势函数更好地建模标签间相互作用。

提出的方法

将CRF推理表述为投影梯度下降优化问题，从而实现通过推理过程的反向传播。
设计依赖于图像像素值和感受野大小的配对势函数，支持空间核与高维双边核的构建。
引入一种可微分的学习框架，使势函数参数可与深度网络一起进行端到端优化。
采用投影梯度下降方案求解CRF推理问题，替代传统的均值场近似方法。
支持标准空间核与高维双边核作为灵活且可学习的势函数。
将学习到的CRF层集成到基于CNN的分割流程中，实现特征与CRF参数的联合优化。

实验结果

研究问题

RQ1与手工设计的高斯势函数相比，CRFs中学习到的非高斯配对势函数是否能提升语义分割性能？
RQ2是否可行通过可微分推理，实现CRF势函数与深度神经网络的端到端训练？
RQ3在准确率与收敛性方面，学习到的势函数与均值场推理相比表现如何？
RQ4某些标签类别之间的相互作用是否更受益于非高斯势函数而非高斯势函数？
RQ5所提出的框架能否在不同语义分割基准上泛化，并取得更优的最先进性能？

主要发现

所提框架在多个公开基准上实现了对以往最先进CNN+CRF模型的分割精度提升。
学习到的势函数，尤其是非高斯势函数，在建模复杂标签相互作用方面优于标准高斯势函数。
通过投影梯度下降实现的端到端训练，可有效优化CRF参数与深度特征。
该方法表明，依赖图像的非高斯势函数可比固定高斯核更好地捕捉空间与基于特征的先验信息。
与均值场近似相比，通过投影梯度下降实现的推理在分割性能上更具优势。
该框架成功学习到适应图像内容的双边核与空间核基势函数，提升了预测的一致性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。