QUICK REVIEW

[论文解读] Deeply Learning the Messages in Message Passing Inference

Guosheng Lin, Chunhua Shen|arXiv (Cornell University)|Jun 6, 2015

Domain Adaptation and Few-Shot Learning参考文献 26被引用 29

一句话总结

该论文提出了一种新颖的深度学习框架，可直接通过卷积神经网络（CNNs）端到端训练以预测条件随机场（CRFs）消息传递推理中的消息，从而无需学习或评估潜在函数。通过端到端学习消息估计器，该方法实现了高效的训练与推理，尤其在类别数量较多时具备良好的可扩展性，并在仅进行一次消息传递迭代的情况下，在PASCAL VOC 2012测试集上实现了73.4%的平均IoU的最先进性能。

ABSTRACT

Deep structured output learning shows great promise in tasks like semantic image segmentation. We proffer a new, efficient deep structured model learning scheme, in which we show how deep Convolutional Neural Networks (CNNs) can be used to estimate the messages in message passing inference for structured prediction with Conditional Random Fields (CRFs). With such CNN message estimators, we obviate the need to learn or evaluate potential functions for message calculation. This confers significant efficiency for learning, since otherwise when performing structured learning for a CRF with CNN potentials it is necessary to undertake expensive inference for every stochastic gradient iteration. The network output dimension for message estimation is the same as the number of classes, in contrast to the network output for general CNN potential functions in CRFs, which is exponential in the order of the potentials. Hence CNN message learning has fewer network parameters and is more scalable for cases that a large number of classes are involved. We apply our method to semantic image segmentation on the PASCAL VOC 2012 dataset. We achieve an intersection-over-union score of 73.4 on its test set, which is the best reported result for methods using the VOC training images alone. This impressive performance demonstrates the effectiveness and usefulness of our CNN message learning method.

研究动机与目标

为解决使用CNN势函数的CRF中联合学习的计算低效问题，该问题在每次SGD步骤中都需要昂贵的边缘推断。
通过直接学习消息，将网络输出维度从潜在阶数的指数级（K^a）降低到线性级（K），从而提升高类别设置下的可扩展性。
通过训练消息估计器以实现单次消息传递迭代，从而实现快速推理。
证明直接消息学习可达到或超越传统CRF-CNN联合学习的性能表现。

提出的方法

提出训练深度CNN以直接估计消息传递推理中的消息，从而替代潜在函数的学习需求。
设计输出维度等于类别数K的消息估计器网络，避免潜在阶数增加带来的维度爆炸。
通过标准分类目标进行端到端反向传播训练消息估计器，避免训练过程中迭代推断的需要。
在推理阶段仅使用一次消息传递迭代，显著降低运行时间，同时保持高精度。
采用数据增强（4种尺度和翻转）以提升泛化能力和性能表现。
将该框架应用于基于PASCAL VOC 2012数据集的语义图像分割任务。

实验结果

研究问题

RQ1是否能够通过深度CNN实现CRF消息传递推理的端到端有效学习，而无需显式建模潜在函数？
RQ2与传统的CNN和CRF联合学习相比，直接消息学习是否能实现更快的训练与推理速度？
RQ3在高类别场景下，消息学习是否能保持或提升性能，同时降低模型复杂度？
RQ4在标准基准测试中，消息学习的性能与最先进CRF-CNN方法相比如何？

主要发现

所提方法在PASCAL VOC 2012测试集上实现了73.4%的平均交并比（IoU），优于所有在相同VOC extra数据集上训练的可比方法。
尽管使用的训练图像数量远少于在COCO上训练的方法，该方法在仅使用VOC 2012训练集（约10,000张图像）的模型中仍达到最先进性能。
性能与在133,000张COCO图像上训练的模型相当，表明具有极高的数据效率和泛化能力。
仅通过一次消息传递迭代即可实现高性能，使推理近乎瞬时，具备高度可扩展性。
消息估计器网络仅输出K个神经元（K为类别数），相比基于潜在函数的方法参数量显著减少，尤其在K较大时优势明显。
在PASCAL VOC 2012测试集的大多数类别上，该方法优于基线CRF-CNN模型（如DeepLab-CRF、CRF-RNN和ContextDCRF）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。