QUICK REVIEW

[论文解读] Deep Cropping via Attention Box Prediction and Aesthetics Assessment

Wenguan Wang, Jianbing Shen|arXiv (Cornell University)|Oct 22, 2017

Visual Attention and Saliency Detection被引用 27

一句话总结

该论文提出了一种基于深度学习的图像裁剪方法，首先预测一个注意力框以识别视觉上重要的区域，然后基于该区域生成并选择最优裁剪结果，通过美学评估实现。该方法通过在不同任务间共享特征，并利用大规模注意力和美学数据集进行训练，实现了5 fps的推理速度，达到当前最优性能。

ABSTRACT

We model the photo cropping problem as a cascade of attention box regression and aesthetic quality classification, based on deep learning. A neural network is designed that has two branches for predicting attention bounding box and analyzing aesthetics, respectively. The predicted attention box is treated as an initial crop window where a set of cropping candidates are generated around it, without missing important information. Then, aesthetics assessment is employed to select the final crop as the one with the best aesthetic quality. With our network, cropping candidates share features within full-image convolutional feature maps, thus avoiding repeated feature computation and leading to higher computation efficiency. Via leveraging rich data for attention prediction and aesthetics assessment, the proposed method produces high-quality cropping results, even with the limited availability of training data for photo cropping. The experimental results demonstrate the competitive results and fast processing speed (5 fps with all steps).

研究动机与目标

解决传统滑动窗口裁剪方法效率低下且搜索策略不自然的问题。
通过在统一的深度学习框架中整合人类视觉注意与美学判断，提升裁剪质量。
通过利用丰富的注意力与美学数据，减少对稀缺专家标注裁剪数据集的依赖。
通过特征共享与局部候选生成，实现高计算效率。
将裁剪建模为自然的两阶段过程：首先基于注意力确定初始裁剪区域，然后通过美学评估进行优化。

提出的方法

采用具有两个共享底部分支的全卷积神经网络，分别用于注意力框预测（ABP）和美学评估（AA）。
通过回归预测注意力边界框，以识别视觉上最显著的区域作为初始裁剪区域。
在预测的注意力框周围生成少量（约1000个）裁剪候选区域，以限制搜索空间。
在ABP与AA网络之间共享早期卷积特征，以降低推理阶段的计算成本。
仅对整张图像运行一次网络以提取共享特征图，然后对每个候选区域裁剪特征，无需重新处理。
将美学评估网络预测得分最高的候选区域作为最终裁剪结果。

Figure 1 : (a)-(c) Flowchart of our method. (d) Conventional methods apply sliding-judging cropping strategy, which is time-consuming and violates natural cropping procedure. (e) Our method works as a cascade of attention-aware crop candidates generation and aesthetics-based crop window selection, w

实验结果

研究问题

RQ1注意力引导的候选生成与美学评估选择相结合的级联方法，是否能在图像裁剪任务中超越传统的滑动窗口方法？
RQ2在注意力预测与美学评估之间共享特征，在不损失准确性的前提下，对提升效率有多大的有效性？
RQ3在大规模注意力与美学数据集上预训练的模型，能在多大程度上弥补裁剪专用标注数据的不足？
RQ4将裁剪建模为确定-调整过程，是否比端到端的滑动窗口优化更符合人类裁剪行为？
RQ5在实时应用中，计算效率与裁剪准确率之间存在怎样的权衡？

主要发现

所提方法在MSR-ICD数据集上达到0.813的IoU分数，显著优于先前方法（如LCC的0.748和ATC的0.605）。
在FLMS数据集上，该方法达到0.810的IoU和0.057的BDE，优于所有对比方法（如VBC的IoU: 0.74，MPC的IoU: 0.41）。
该方法可实现每秒处理5帧图像，展现出适合实时应用的高计算效率。
共享特征结构减少了冗余计算，实现在保持高精度的同时实现快速推理。
尽管裁剪专用训练数据有限，模型仍表现出良好的泛化能力，有效利用了丰富的注意力与美学预训练数据集。
定性结果表明，预测的注意力框与人工标注的显著区域高度一致，最终裁剪结果视觉上令人愉悦且构图均衡。

Figure 2 : (a) Input image. (b) Attention map. (c) Ground truth attention box generation via [ 3 ] . (d) Positive (red) and negative (blue) defaults boxes are generated for training ABP network according to ground truth attention box.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。