QUICK REVIEW

[论文解读] Scribbler: Controlling Deep Image Synthesis with Sketch and Color

Patsorn Sangkloy, Jingwan Lu|arXiv (Cornell University)|Dec 2, 2016

Generative Adversarial Networks and Image Synthesis参考文献 44被引用 26

一句话总结

Scribbler 提出了一种前馈式条件生成对抗网络（conditional GAN），能够从稀疏的用户草图和色彩笔触生成高质量、多样化且逼真的图像，支持实时交互式编辑。该方法通过结合对抗训练与用户引导的草图和色彩控制，在人脸、汽车和卧室等场景中实现了更优的逼真度与可控性，同时支持可控的图像着色。

ABSTRACT

Recently, there have been several promising methods to generate realistic imagery from deep convolutional networks. These methods sidestep the traditional computer graphics rendering pipeline and instead generate imagery at the pixel level by learning from large collections of photos (e.g. faces or bedrooms). However, these methods are of limited utility because it is difficult for a user to control what the network produces. In this paper, we propose a deep adversarial image synthesis architecture that is conditioned on sketched boundaries and sparse color strokes to generate realistic cars, bedrooms, or faces. We demonstrate a sketch based image synthesis system which allows users to 'scribble' over the sketch to indicate preferred color for objects. Our network can then generate convincing images that satisfy both the color and the sketch constraints of user. The network is feed-forward which allows users to see the effect of their edits in real time. We compare to recent work on sketch to image synthesis and show that our approach can generate more realistic, more diverse, and more controllable outputs. The architecture is also effective at user-guided colorization of grayscale images.

研究动机与目标

让非专业用户能够通过直观的草图和色彩控制生成逼真图像。
解决现有深度图像生成方法仅依赖潜在空间采样而缺乏可控性的问题。
开发一种快速、交互式的系统，实现在用户编辑过程中的实时反馈。
将深度图像生成技术从人脸扩展到汽车和卧室等类别，提升多样性与逼真度。
展示该框架在使用稀疏色彩笔触实现可控图像着色方面的有效性。

提出的方法

训练一个条件生成对抗网络架构，使其根据输入的草图和稀疏色彩笔触生成图像。
采用两阶段训练流程：首先使用 VGG-19 特征优化内容损失（像素损失与特征损失），然后通过对抗损失进行微调。
内容损失通过 VGG-19 的 ReLU2-2 层计算，以保留精细的草图细节。
对抗训练采用高权重（≈1e8）用于照片重建，中等权重（≈1e5）用于着色，以平衡逼真度与控制性。
通过引入多样化的草图风格（包括合成草图和不完美的手绘草图）进行数据增强，以提升模型鲁棒性。
生成器采用前馈网络结构，支持实时推理与交互式编辑。

实验结果

研究问题

RQ1深度生成模型能否从稀疏草图和色彩笔触生成逼真、多样化且可控的图像？
RQ2与非对抗基线方法相比，对抗训练在提升图像质量与逼真度方面有何改进？
RQ3该模型能否泛化到多样化的草图风格，包括不完美的手绘草图？
RQ4同一架构在多大程度上能够同时支持草图到图像生成与灰度图到彩色图生成？
RQ5在最小化对抗损失对罕见或非标准色彩选择的影响的同时，如何保持用户控制能力？

主要发现

所提出的方法在分辨率、多样性与逼真度方面均优于以往的草图到图像生成方法，包括基于优化推理的方法。
前馈式架构支持实时用户交互，实现在草图与色彩编辑过程中的即时视觉反馈。
模型对不完美的手绘草图和合成草图均表现出良好泛化能力，证明其对输入变化具有鲁棒性。
系统成功实现了可控的图像着色，稀疏色彩笔触可引导网络生成语义上合理的色彩分配。
尽管性能有所提升，模型在某些情况下仍会出现色彩溢出至物体边界外的问题，且由于对抗损失约束，难以完全保留用户指定的罕见色彩。
两阶段训练流程（先内容损失，后对抗微调）带来了更优的图像质量与更快的收敛速度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。