QUICK REVIEW

[论文解读] Fixing the train-test resolution discrepancy

Hugo Touvron, Andrea Vedaldi|arXiv (Cornell University)|Jun 14, 2019

Advanced Neural Network Applications参考文献 44被引用 23

一句话总结

本文解决了图像分类中的训练-测试分辨率不一致问题，即在较低分辨率下进行训练可因数据增强导致的分布偏移而提升推理时性能。本文提出一种轻量级微调方法，将低分辨率训练的模型适配至更高分辨率的测试阶段，使用在224×224图像上训练的ResNeXt-101 32x48d模型，在320×320分辨率下微调后，ImageNet上的top-1准确率达到86.4%，达到当前最先进水平。

ABSTRACT

Data-augmentation is key to the training of neural networks for image classification. This paper first shows that existing augmentations induce a significant discrepancy between the typical size of the objects seen by the classifier at train and test time. We experimentally validate that, for a target test resolution, using a lower train resolution offers better classification at test time. We then propose a simple yet effective and efficient strategy to optimize the classifier performance when the train and test resolutions differ. It involves only a computationally cheap fine-tuning of the network at the test resolution. This enables training strong classifiers using small training images. For instance, we obtain 77.1% top-1 accuracy on ImageNet with a ResNet-50 trained on 128x128 images, and 79.8% with one trained on 224x224 image. In addition, if we use extra training data we get 82.5% with the ResNet-50 train with 224x224 images. Conversely, when training a ResNeXt-101 32x48d pre-trained in weakly-supervised fashion on 940 million public images at resolution 224x224 and further optimizing for test resolution 320x320, we obtain a test top-1 accuracy of 86.4% (top-5: 98.0%) (single-crop). To the best of our knowledge this is the highest ImageNet single-crop, top-1 and top-5 accuracy to date.

研究动机与目标

识别并解决图像分类中因训练与推理分辨率不同而引起的分布偏移问题。
通过对齐训练与测试数据的分辨率，提升模型泛化能力与推理时准确率。
通过使用低分辨率训练裁剪，实现在保持高测试准确率的同时，实现更快、更高效的训练。
开发一种计算成本低廉的方法，用于在训练后将预训练模型适配至更高分辨率的测试阶段。

提出的方法

在较低分辨率（如128×128或160×160）下训练分类器，以减少训练时间和显存占用。
在推理时，使用更高分辨率的裁剪（如224×224或320×320）以更好地匹配实际图像内容。
仅微调最后的全连接层和批归一化层，以使模型适应新分辨率。
训练期间使用标准数据增强，但调整分类区域（RoC）采样方式，以减少尺度方差。
利用在大规模弱监督数据集上预训练的模型，并应用分辨率适配以提升性能。
将该方法应用于标准模型与大规模模型，包括ResNet-50、PNASNet-5-Large和ResNeXt-101 32x48d。

实验结果

研究问题

RQ1训练与推理之间的分辨率差异是否会影响图像分类模型的性能？
RQ2尽管输入分辨率较低，是否仍可通过在低分辨率下训练来提升推理时的准确率？
RQ3在测试分辨率下进行简单的微调过程是否能有效补偿分辨率不匹配问题？
RQ4所提出的方法是否在标准模型与大规模模型上均能提升性能？
RQ5该方法是否可在测试输入分辨率更高的迁移学习场景中有效应用？

主要发现

在128×128图像上训练的ResNet-50在ImageNet上达到77.1%的top-1准确率，优于标准的224×224训练设置。
在320×320测试分辨率下微调的ResNet-50达到79.8%的top-1准确率，证明了分辨率适配的有效性。
在9.4亿张224×224图像上预训练的ResNeXt-101 32x48d模型，在320×320分辨率下微调后达到86.4%的top-1准确率，创下ImageNet新SOTA记录。
该方法在多个迁移学习基准上均表现提升，包括iNaturalist、Stanford Cars和Oxford-102 Flowers。
在更高测试分辨率下，性能提升更为显著，表明随着图像质量提高，该方法的相关性不断增强。
该方法实现了显著的训练加速（例如，在半分辨率下实现3倍推理速度提升）和显存减少，同时不损失最终准确率。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。