[论文解读] Examining the Impact of Blur on Recognition by Convolutional Networks
该论文分析模糊如何降低基于CNN的识别效果,并显示对带模糊图像进行微调能恢复大部分准确性并诱导模糊不变表示;鲁棒性能够跨模糊类型转移,甚至在模糊下提升分割。
State-of-the-art algorithms for many semantic visual tasks are based on the use of convolutional neural networks. These networks are commonly trained, and evaluated, on large annotated datasets of artifact-free high-quality images. In this paper, we investigate the effect of one such artifact that is quite common in natural capture settings: optical blur. We show that standard network models, trained only on high-quality images, suffer a significant degradation in performance when applied to those degraded by blur due to defocus, or subject or camera motion. We investigate the extent to which this degradation is due to the mismatch between training and input image statistics. Specifically, we find that fine-tuning a pre-trained model with blurred images added to the training set allows it to regain much of the lost accuracy. We also show that there is a fair amount of generalization between different degrees and types of blur, which implies that a single network model can be used robustly for recognition when the nature of the blur in the input is unknown. We find that this robustness arises as a result of these models learning to generate blur invariant representations in their hidden layers. Our findings provide useful insights towards developing vision systems that can perform reliably on real world images affected by blur.
研究动机与目标
- 评估光学模糊在以高质量图像为训练对象的CNN基础上的图像分类与分割的影响。
- 量化不同失焦、运动以及相机抖动模糊在ImageNet与VOC2012相关任务上的性能下降。
- 研究用带模糊的图像进行微调是否能够恢复准确性并在表征中引入模糊不变性。
- 探索跨模糊泛化并比较混合模糊微调与显式去模糊方法。
提出的方法
- 使用多种模糊核( defocus, motion, camera shake, Gaussian)对ImageNet验证图像的模糊版本评估在VGG-16(ImageNet预训练)上的表现。
- 在固定尺度下用 sharp 与 blurred 图像的混合集合进行微调(并包含尺度变化),以评估鲁棒性。
- 在模糊条件下分析逐层激活相似性,以理解模糊不变性在何处产生。
- 将混合模糊微调与使用已知核进行显式去模糊进行比较。
- 将分析扩展到语义分割,使用基于Zoomout的网络在VOC2012上对模糊输入进行测试。
实验结果
研究问题
- RQ1模糊如何影响在清晰图像上训练的CNN的Top-5准确率与预测置信度?
- RQ2在带模糊数据上进行微调是否能够恢复准确性并产生模糊不变的内部表征?
- RQ3鲁棒性是否能跨越不同的模糊类型与程度,以及尺度如何影响这种鲁棒性?
- RQ4混合模糊微调是否比先去模糊再用清晰图像分类器更有效?
- RQ5来自模糊鲁棒训练的改进是否也扩展到语义分割?
主要发现
| Scale | Blur Type | Top-5 Accuracy |
|---|---|---|
| 128 | Sharp | 76.07% |
| 128 | D2 | 74.83% |
| 128 | D4 | 68.48% |
| 128 | D6 | 61.03% |
| 128 | D8 | 53.34% |
| 128 | Camera Shake | 58.91% |
| 128 | Gaussian σ=4 | 56.34% |
| 256 | Sharp | 90.88% |
| 256 | D4 | 81.48% |
| 256 | D8 | 60.97% |
| 256+512 | Sharp | 92.17% |
| 256+512 | D4 | 80.93% |
| 256+512 | D8 | 51.40% |
| 512 | Sharp | 90.76% |
| 512 | D8 | 22.52% |
| 512 | Gaussian σ=8 | 3.41% |
| Fine-tuned (mix) 256 | Sharp | 91.03% |
| Fine-tuned (mix) 256 | D8 | 87.01% |
| Fine-tuned (mix) 512 | Sharp | 85.99% |
| Fine-tuned (per-scale) 256+512 | Sharp | 91.10% |
| Original | Sharp | 90.60% |
- 模糊显著降低对模糊输入的CNN在准确性上的表现,且卷积核越大,下降越大。
- 用清晰与模糊图像混合进行微调可以在模糊输入上恢复大部分损失的准确性,对清晰图像几乎无损。
- 在多尺度模糊下的训练可实现对模糊类型的泛化;去焦与相机抖动之间的迁移相互提高,尽管跨模糊泛化并非完美。
- 模糊鲁棒训练在无模糊图像上的预测熵增加程度低于在模糊图像上的预测熵,且在模糊输入上获得更高的置信度。
- 在固定尺度(256)下的混合模糊微调提供强鲁棒性;多尺度(256+512)可能带来边际增益但代价较大,而逐尺度网络的提升较小。
- 与显式去模糊相比,模糊鲁棒微调在精度上达到相似或更好,但计算成本要低得多(去模糊代价高)。
- 在语义分割(VOC2012)中,带模糊的微调提升了模糊图像上的mIOU,但与清晰图像之间的差距仍大于分类任务。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。