QUICK REVIEW

[论文解读] GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning

Zhenyu Xie, Zaiyu Huang|arXiv (Cornell University)|Mar 24, 2023

3D Shape Modeling and Analysis被引用 9

一句话总结

GP-VTON 引入局部流动全局解析变形模块和动态梯度截断，以实现高保真、语义正确的 garment 变形，用于多类别虚拟试穿，在高分辨率基准上超越最先进方法。

ABSTRACT

Image-based Virtual Try-ON aims to transfer an in-shop garment onto a specific person. Existing methods employ a global warping module to model the anisotropic deformation for different garment parts, which fails to preserve the semantic information of different parts when receiving challenging inputs (e.g, intricate human poses, difficult garments). Moreover, most of them directly warp the input garment to align with the boundary of the preserved region, which usually requires texture squeezing to meet the boundary shape constraint and thus leads to texture distortion. The above inferior performance hinders existing methods from real-world applications. To address these problems and take a step towards real-world virtual try-on, we propose a General-Purpose Virtual Try-ON framework, named GP-VTON, by developing an innovative Local-Flow Global-Parsing (LFGP) warping module and a Dynamic Gradient Truncation (DGT) training strategy. Specifically, compared with the previous global warping mechanism, LFGP employs local flows to warp garments parts individually, and assembles the local warped results via the global garment parsing, resulting in reasonable warped parts and a semantic-correct intact garment even with challenging inputs.On the other hand, our DGT training strategy dynamically truncates the gradient in the overlap area and the warped garment is no more required to meet the boundary constraint, which effectively avoids the texture squeezing problem. Furthermore, our GP-VTON can be easily extended to multi-category scenario and jointly trained by using data from different garment categories. Extensive experiments on two high-resolution benchmarks demonstrate our superiority over the existing state-of-the-art methods.

研究动机与目标

解决 VTON 中全局变形的局限性，如在具有挑战性的姿势和服装下的语义损失和纹理失真。
开发一个支持多类别 VTON（上装、下装、裙装）的统一框架，具有高真实感。
通过保持部件语义并避免边界引起的纹理挤压来改进服装变形。
提出训练策略以在多样输入之间稳定变形与纹理保持。

提出的方法

提出局部流动全局解析（Local-Flow Global-Parsing，LFGP）变形：局部对服装部件进行变形，并结合全局服装解析以形成一致的变形服装。
引入对每个服装部件（左袖、右袖、躯干）的级联局部流估计，结合来自不同编码器的多尺度特征（人像与服装），并辅以全局解析模块以确保无缝拼接。
使用动态梯度截断（Dynamic Gradient Truncation，DGT）训练策略，基于穿着风格（塞进/收紧）自适应地截断保存区域的梯度，防止纹理挤压或拉伸。
采用基于 Res-UNet 的试穿生成器，融合变形服装、皮肤/色彩映射与保持区域引导，合成最终试穿图像。
通过对三部分服装分区（左袖、右袖、躯干）在上衣、下装、裙装等类别上的应用，将 GP-VTON 扩展到多类别 VTON，实现跨服装的统一训练。

实验结果

研究问题

RQ1局部部件级的服装变形结合全局解析是否可在复杂姿势下实现语义正确的变形？
RQ2相较于固定截断或不截断，动态梯度截断是否提升了保留区域周围的纹理保持？
RQ3GP-VTON 在多类别虚拟试穿（上装、下装、裙装）上的泛化程度如何，同时保持视觉真实感与语义正确性？

主要发现

Method	SSIM	FID	LPIPS	mIoU	HE
PF-AFN	0.8858	9.475	0.0871	0.8412	14.9%
FS-VTON	0.8829	9.552	0.0906	0.8357	8.80%
HR-VITON	0.8623	16.21	0.1094	0.6949	9.10%
SDAFN	0.8821	9.400	0.0922	0.5927	16.3%
GP-VTON (Ours)	0.8939	9.197	0.0799	0.8764	50.9%

GP-VTON 在 VITON-HD 与 DressCode 的 SSIM、FID、LPIPS、mIoU 指标上持续优于基线方法。
G-P-VTON 相较基线在 mIoU 上取得显著提升（0.8764）并在 HE 指标上达到更高分（50.9%），表明更好的语义正确性与感知真实感。
消融实验表明局部流相比全局流能提升 SSIM/LPIPS 与 mIoU，而全局解析有效消除了重叠伪影。
动态梯度截断（DGT）降低纹理失真，相比静态梯度策略在 R_diff 及纹理保持方面表现更好。
该方法在高分辨率基准上依然有效，显示出在多类别 VTON 领域的潜力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。