Skip to main content
QUICK REVIEW

[论文解读] Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

Xingang Pan, Ayush Tewari|arXiv (Cornell University)|May 18, 2023
Generative Adversarial Networks and Image Synthesis参考文献 67被引用 11
一句话总结

Drag GAN 通过拖动句柄点向目标位置,利用基于特征的运动监督和基于 GAN 的点跟踪方法,实现对 GAN 生成图像的交互式、精确的点级操作。

ABSTRACT

Synthesizing visual content that meets users' needs often requires flexible and precise controllability of the pose, shape, expression, and layout of the generated objects. Existing approaches gain controllability of generative adversarial networks (GANs) via manually annotated training data or a prior 3D model, which often lack flexibility, precision, and generality. In this work, we study a powerful yet much less explored way of controlling GANs, that is, to "drag" any points of the image to precisely reach target points in a user-interactive manner, as shown in Fig.1. To achieve this, we propose DragGAN, which consists of two main components: 1) a feature-based motion supervision that drives the handle point to move towards the target position, and 2) a new point tracking approach that leverages the discriminative generator features to keep localizing the position of the handle points. Through DragGAN, anyone can deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc. As these manipulations are performed on the learned generative image manifold of a GAN, they tend to produce realistic outputs even for challenging scenarios such as hallucinating occluded content and deforming shapes that consistently follow the object's rigidity. Both qualitative and quantitative comparisons demonstrate the advantage of DragGAN over prior approaches in the tasks of image manipulation and point tracking. We also showcase the manipulation of real images through GAN inversion.

研究动机与目标

  • 旨在在不依赖手动标注或三维先验的情况下,实现对 GAN 的灵活、精确和通用的可控性。
  • 通过将句柄点拖动到图像上的目标点,启用多点、用户指定的编辑。
  • 开发利用判别式 GAN 特征的运动监督和点跟踪,无需额外的网络。
  • 通过遮罩支持区域特定的编辑,并通过 GAN 反演实现对真实图像的编辑。
  • 在多样类别(动物、人体、汽车、风景)上展示有效性,并与现有方法进行比较。

提出的方法

  • 使用判别式 GAN 特征图(StyleGAN2 的第6块)作为编辑空间,并应用偏移的补丁损失,通过潜在代码优化将句柄点移动到目标点。
  • 以小步长的运动监督对潜在代码 w(在 W 或 W+ 中)进行优化,在仅更新前 ~6 层以保持外观的同时,将句柄点推进到它们的目标点。
  • 通过在当前 GAN 特征空间 F' 中使用初始点特征 F0 进行最近邻搜索来跟踪句柄点,实现鲁棒、快速的点跟踪,而无需额外的跟踪网络。
  • 迭代运动监督和点跟踪,直到所有句柄点到达相应目标,可选地使用用户定义的可移动区域遮罩来约束编辑。
  • 提供用于交互式编辑的 GUI,并通过 GAN 反演(如 PTI)实现对真实图像的编辑,将真实照片映射到 GAN 潜在空间以进行操作。
Figure 1. Our approach DragGAN allows users to ”drag” the content of any GAN-generated images. Users only need to click a few handle points ( red ) and target points ( blue ) on the image, and our approach will move the handle points to precisely reach their corresponding target points. Users can op
Figure 1. Our approach DragGAN allows users to ”drag” the content of any GAN-generated images. Users only need to click a few handle points ( red ) and target points ( blue ) on the image, and our approach will move the handle points to precisely reach their corresponding target points. Users can op

实验结果

研究问题

  • RQ1在没有领域特定先验或额外网络的情况下,是否能够在 GAN 生成图像上实现多点、精确、区域感知的点级操作?
  • RQ2同时利用 GAN 判别式特征空间用于运动监督和点跟踪,是否能带来准确、高效的交互式编辑?
  • RQ3在精度和真实感方面,Drag GAN 与之前的方法(例如 UserControllableLT、RAFT、PIPs)在不同对象类别上表现如何?
  • RQ4是否可以先将真实图像反演到 GAN 潜在空间,再应用基于点的操作来编辑?
  • RQ5遮罩可移动区域对编辑的稳定性和保真度有何影响?

主要发现

  • Drag GAN 通过将句柄点驱动到目标点,在动物、人体、汽车和风景等领域实现了准确的操作。
  • 在面部特征点操控和配对图像重建中,其精度优于 UserControllableLT 并能维持较高的图像质量(更低的 FID)。
  • 基于 GAN 特征的点跟踪(通过在 F' 中的最近邻)在 GAN 生成帧中比 RAFT 或 PIPs 获得更高的跟踪精度。
  • 对可移动区域进行遮罩实现区域特定的编辑,未遮罩的区域保持固定。
  • GAN 反演使对真实图像进行编辑成为可能:在应用点基编辑之前将其映射到 GAN 潜在空间。
  • 该方法在一定程度上展示了分布外外推能力,但在超出训练分布时可能出现伪影;局限包括纹理较少区域的跟踪漂移以及潜在的隐私问题。
Figure 2. Overview of our pipeline. Given a GAN-generated image, the user only needs to set several handle points ( red dots), target points ( blue dots), and optionally a mask denoting the movable region during editing ( brighter area). Our approach iteratively performs motion supervision (Sec. 3.2
Figure 2. Overview of our pipeline. Given a GAN-generated image, the user only needs to set several handle points ( red dots), target points ( blue dots), and optionally a mask denoting the movable region during editing ( brighter area). Our approach iteratively performs motion supervision (Sec. 3.2

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。