QUICK REVIEW

[论文解读] Hide-and-Seek: A Data Augmentation Technique for Weakly-Supervised Localization and Beyond

Krishna Kumar Singh, Hao Yu|arXiv (Cornell University)|Nov 6, 2018

Human Pose and Action Recognition参考文献 37被引用 68

一句话总结

Hide-and-Seek 在训练期间随机隐藏图像补丁，以迫使网络学习多个对象部分，从而在不改变架构的情况下改善弱监督定位并推广到各种视觉任务。

ABSTRACT

We propose 'Hide-and-Seek' a general purpose data augmentation technique, which is complementary to existing data augmentation techniques and is beneficial for various visual recognition tasks. The key idea is to hide patches in a training image randomly, in order to force the network to seek other relevant content when the most discriminative content is hidden. Our approach only needs to modify the input image and can work with any network to improve its performance. During testing, it does not need to hide any patches. The main advantage of Hide-and-Seek over existing data augmentation techniques is its ability to improve object localization accuracy in the weakly-supervised setting, and we therefore use this task to motivate the approach. However, Hide-and-Seek is not tied only to the image localization task, and can generalize to other forms of visual input like videos, as well as other recognition tasks like image classification, temporal action localization, semantic segmentation, emotion recognition, age/gender estimation, and person re-identification. We perform extensive experiments to showcase the advantage of Hide-and-Seek on these various visual recognition problems.

研究动机与目标

引入一种通用的数据增强技术，能够补充现有方法。
在弱监督设定下提升目标定位，而无需额外注释。
展示该方法在多种任务和架构中的适用性。

提出的方法

将每张训练图像划分为一个 S×S 的补丁网格，在训练时以概率 p_hide 隐藏每个补丁。
将隐藏的像素值设为数据集均值，以使训练和测试的激活分布保持一致。
将该技术应用于 CNNs（例如 AlexNet、GoogLeNet），并使用 CAM/GAP 进行定位。
通过在训练时隐藏帧段，将该方法扩展到视频以实现时序动作定位。
在包括弱监督目标定位、语义分割、时序动作定位等任务上进行评估。

实验结果

研究问题

RQ1在训练中随机隐藏补丁是否能在超越标准数据增强的情况下提升定位？
RQ2Hide-and-Seek 在多种架构和视觉任务中是否有效？
RQ3应如何设置隐藏补丁的值以最小化训练与测试分布的不匹配？
RQ4该方法是否可以从图像扩展到视频以实现时序定位？
RQ5补丁大小和变化性对性能的影响是什么？

主要发现

方法	GT-known Loc	Top-1 Loc
AlexNet-GAP (baseline)	54.90	36.25
AlexNet-HaS-16	57.86	36.77
AlexNet-HaS-32	58.75	37.33
AlexNet-HaS-44	58.55	37.54
AlexNet-HaS-56	58.43	37.34
AlexNet-HaS-Mixed	58.68	37.65
GoogLeNet-GAP (baseline)	58.41	43.60
GoogLeNet-HaS-16	59.83	44.62
GoogLeNet-HaS-32	60.29	45.21
GoogLeNet-HaS-44	60.11	44.75
GoogLeNet-HaS-56	59.93	44.78

在 ILSVRC 2016 上，GT-known Loc 与 Top-1 Loc 相对于基线在多种补丁大小下获得显著的定位提升。
GoogLeNet-HaS 在所有测试的补丁大小下均优于 GoogLeNet-GAP 的定位指标。
AlexNet-HaS 与 GoogLeNet-HaS 相对于全图基线在定位指标上提升了若干百分点。
混合大小 HaS 变体（HaS-Mixed）在 AlexNet 上实现了最佳的 Top-1 Loc。
Hide-and-Seek 在包括图像分类、语义分割、情感识别和再识别等在内的多项任务上提升了性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。