Skip to main content
QUICK REVIEW

[论文解读] Distortion Robust Image Classification with Deep Convolutional Neural Network based on Discrete Cosine Transform

Tahmid Hossain, Shyh Wei Teng|arXiv (Cornell University)|Nov 14, 2018
Image Processing Techniques and Applications被引用 3
一句话总结

该论文提出DCT-Net,一种基于离散余弦变换(DCT)的失真鲁棒卷积神经网络模块,可在各种失真条件下提升图像分类性能。通过在训练过程中有选择地丢弃高频分量,而无需事先了解失真类型或程度,DCT-Net能有效泛化到未见过的失真,并在CIFAR-10/100和ImageNet基准测试中优于现有方法。

ABSTRACT

Convolutional Neural Network is good at image classification. However, it is found to be vulnerable to image quality degradation. Even a small amount of distortion such as noise or blur can severely hamper the performance of these CNN architectures. Most of the work in the literature strives to mitigate this problem simply by fine-tuning a pre-trained CNN on mutually exclusive or a union set of distorted training data. This iterative fine-tuning process with all known types of distortion is exhaustive and the network struggles to handle unseen distortions. In this work, we propose distortion robust DCT-Net, a Discrete Cosine Transform based module integrated into a deep network which is built on top of VGG16. Unlike other works in the literature, DCT-Net is blind to the distortion type and level in an image both during training and testing. As a part of the training process, the proposed DCT module discards input information which mostly represents the contribution of high frequencies. The DCT-Net is trained blindly only once and applied in generic situation without further retraining. We also extend the idea of traditional dropout and present a training adaptive version of the same. We evaluate our proposed method against Gaussian blur, motion blur, salt and pepper noise, Gaussian noise and speckle noise added to CIFAR-10/100 and ImageNet test sets. Experimental results demonstrate that once trained, DCT-Net not only generalizes well to a variety of unseen image distortions but also outperforms other methods in the literature.

研究动机与目标

  • 解决深度CNN对图像质量退化(如模糊和噪声)的脆弱性问题。
  • 克服现有方法在已知失真类型上需大量微调的局限性。
  • 开发一种单一、通用且可训练的模块,以提升在多样化和未见失真下的鲁棒性。
  • 引入一种基于DCT的模块,在训练和推理过程中均对失真类型和程度保持盲态。
  • 通过过滤与失真伪影相关的高频分量,提升泛化能力。

提出的方法

  • 将基于DCT的模块(DCT-Net)集成到VGG16架构中,用于在分类前预处理输入特征。
  • 对输入特征图应用DCT,将其分解为频率分量,突出低频内容。
  • 在训练过程中丢弃高频分量,以降低对失真相关伪影的敏感性。
  • 以盲训练方式一次性训练DCT-Net——无需对失真类型或程度进行监督,从而实现对未见失真的泛化。
  • 引入一种训练自适应的Dropout变体,根据训练动态动态调整正则化强度。
  • 采用端到端训练,联合优化DCT-Net模块和分类头在干净数据与失真数据上的表现。

实验结果

研究问题

  • RQ1基于DCT的模块是否能在不了解失真类型或程度的情况下,提升深度CNN对多样化图像失真的鲁棒性?
  • RQ2在训练过程中过滤高频分量是否能增强对未见失真的泛化能力?
  • RQ3在多种失真类型下,所提出的DCT-Net与现有微调模型在标准基准测试中的表现如何比较?
  • RQ4单一统一的DCT-Net模块是否能在多种失真场景下超越针对特定任务微调的模型?
  • RQ5DCT-Net中的自适应Dropout机制是否能提升训练过程中的泛化能力和鲁棒性?

主要发现

  • 在CIFAR-10和CIFAR-100上,DCT-Net在高斯模糊、运动模糊、椒盐噪声、高斯噪声和斑点噪声等多种失真下均表现出优越性能。
  • 该模型无需微调即可有效泛化到未见失真,展现出超越训练分布的鲁棒性。
  • DCT-Net优于依赖对每类失真进行迭代微调的现有方法。
  • 基于DCT的滤波机制能有效降低对高频噪声和模糊伪影的敏感性。
  • 训练自适应Dropout组件有助于提升训练过程中的泛化能力和稳定性。
  • 即使在输入受多种失真污染的情况下,该方法在ImageNet上仍保持高精度,证实其在大规模数据集上的可扩展性。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。