QUICK REVIEW

[论文解读] What You Expect is NOT What You Get! Questioning Reconstruction/Classification Correlation of Stacked Convolutional Auto-Encoder Features.

Michele Alberti, Mathias Seuret|arXiv (Cornell University)|Mar 13, 2017

Handwritten Text Recognition Techniques参考文献 15被引用 3

一句话总结

本文挑战了自动编码器特征中高重建分数即代表分类性能优越的假设。通过堆叠卷积自编码器，本文表明重建性能受解码器质量的影响，且与分类准确率之间无可靠相关性，结论是分类能力必须独立评估。

ABSTRACT

In this paper, we thoroughly investigate the quality of features produced by deep neural network architectures obtained by stacking and convolving Auto-Encoders. In particular, we are interested into the relation of their reconstruction score with their performance on document layout analysis. When using Auto-Encoders, intuitively one could assume that features which are good for reconstruction will also lead to high classification accuracies. However, we prove that this is not always the case. We examine the reconstruction score, training error and the results obtained if we were to use the same features for both input reconstruction and a classification task. We show that the reconstruction score is not a good metric because it is biased by the decoder quality. Furthermore, experimental results suggest that there is no correlation between the reconstruction score and the quality of features for a classification task and that given the network size and configuration it is not possible to make assumptions on its training error magnitude. Therefore we conclude that both, reconstruction score and training error should not be used jointly to evaluate the quality of the features produced by a Stacked Convolutional Auto-Encoders for a classification task. Consequently one should independently investigate the network classification abilities directly.

研究动机与目标

调查堆叠卷积自编码器在高重建分数下学习到的特征是否也对分类任务有效。
检验解码器质量对重建分数的影响及其作为特征质量代理指标的可靠性。
评估是否可从网络大小和配置预测训练误差的大小。
确定在堆叠卷积自编码器中，重建性能与分类性能之间是否存在相关性。
主张应直接评估分类性能，而非依赖重建误差或训练误差作为间接指标。

提出的方法

在文档版面数据上训练堆叠卷积自编码器，通过编码器-解码器学习机制提取分层特征。
计算输入图像与重建图像之间的平均误差作为特征质量的代理指标。
将同一组学习到的特征同时用于重建任务和下游分类任务，以比较性能表现。
在优化过程中监控训练误差，以评估模型的泛化行为。
通过不同网络架构和规模的实验，分析模型容量与误差大小之间的关系。
使用相同特征独立评估分类性能，以评估其在版面分析中的实用性。

实验结果

研究问题

RQ1在堆叠卷积自编码器中，重建分数与分类准确率之间是否存在显著相关性？
RQ2重建分数在多大程度上受解码器网络质量的影响？
RQ3是否可从网络大小和配置可靠预测训练误差的大小？
RQ4重建分数和训练误差能否作为分类任务中特征质量的可靠代理指标？
RQ5是否必须将分类性能评估独立于重建和训练指标进行？

主要发现

重建分数严重受解码器质量的影响，因此作为特征质量指标不可靠。
在文档版面分析任务中，未发现重建分数与分类性能之间存在显著相关性。
仅凭网络大小和配置无法可靠预测训练误差的大小。
能够良好重建输入的特征并不一定带来高分类准确率。
本研究结论认为，重建分数与训练误差不应联合用于评估特征质量。
直接评估分类性能至关重要，且必须独立于重建或训练指标进行。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。