QUICK REVIEW

[论文解读] Network Fusion for Content Creation with Conditional INNs.

Robin Rombach, Patrick Esser|arXiv (Cornell University)|May 27, 2020

Generative Adversarial Networks and Image Synthesis参考文献 35被引用 3

一句话总结

本文提出一种基于条件可逆流（INNs）的网络融合方法，用于复用预训练的、任务特定的模型（如用于文本的 BERT 和用于图像的 BigGAN），在无需微调或重新训练的情况下，将其应用于新的内容生成任务（如文本到图像生成）。通过学习一个专家隐藏表征的生成模型，该模型以另一个专家的表征为条件，实现了跨模态的高效、可控且低资源的内容合成。

ABSTRACT

Artificial Intelligence for Content Creation has the potential to reduce the amount of manual content creation work significantly. While automation of laborious work is welcome, it is only useful if it allows users to control aspects of the creative process when desired. Furthermore, widespread adoption of semi-automatic content creation depends on low barriers regarding the expertise, computational budget and time required to obtain results and experiment with new techniques. With state-of-the-art approaches relying on task-specific models, multi-GPU setups and weeks of training time, we must find ways to reuse and recombine them to meet these requirements. Instead of designing and training methods for controllable content creation from scratch, we thus present a method to repurpose powerful, existing models for new tasks, even though they have never been designed for them. We formulate this problem as a translation between expert models, which includes common content creation scenarios, such as text-to-image and image-to-image translation, as a special case. As this translation is ambiguous, we learn a generative model of hidden representations of one expert conditioned on hidden representations of the other expert. Working on the level of hidden representations makes optimal use of the computational effort that went into the training of the expert model to produce these efficient, low-dimensional representations. Experiments demonstrate that our approach can translate from BERT, a state-of-the-art expert for text, to BigGAN, a state-of-the-art expert for images, to enable text-to-image generation, which neither of the experts can perform on its own. Additional experiments show the wide applicability of our approach across different conditional image synthesis tasks and improvements over existing methods for image modifications.

研究动机与目标

实现计算和专业知识门槛低的可控、半自动内容生成。
解决任务特定模型难以轻松复用于新内容生成任务的局限性。
通过复用现有预训练专家模型而非从零开始训练，减少训练时间和资源需求。
仅使用预训练模型的隐藏表征，实现不同模态之间的转换（如文本到图像）。
提供一种可泛化的条件图像合成框架，在灵活性和性能方面优于现有方法。

提出的方法

将内容生成建模为预训练专家模型隐藏表征之间的翻译任务。
使用条件可逆流（INNs）建模一个专家隐藏表征的生成分布，条件是另一个专家的表征。
在源专家模型和目标专家模型提取的成对隐藏表征上训练 INN。
仅在低维、预先计算好的隐藏表征上操作，以最大化对现有模型计算的复用。
通过利用训练好的 INN，实现对未见输入的零样本迁移，生成输出。
通过将同一框架适配到不同专家对，支持多样化的条件图像合成任务。

实验结果

研究问题

RQ1预训练的、任务特定的模型是否可以在不微调或重新训练的情况下复用于新内容生成任务？
RQ2基于条件 INN 的隐藏表征翻译在实现跨模态生成（如文本到图像）方面效果如何？
RQ3与现有方法相比，该方法在图像修改和条件合成任务中是否能达到具有竞争力的性能？
RQ4该框架在多大程度上降低了内容生成中的计算和专业知识门槛？
RQ5该方法在不同模型架构和内容生成场景中的泛化能力如何？

主要发现

该方法成功通过融合 BERT（文本专家）和 BigGAN（图像专家）实现了文本到图像生成，这是任一模型单独无法完成的任务。
该方法在条件图像合成中取得了具有竞争力的结果，在灵活性和可控性方面优于现有方法。
实验表明，该框架可泛化至文本到图像翻译之外的多样化条件图像合成任务。
使用隐藏表征实现了高效推理，仅需极少额外训练，显著降低了计算成本。
该方法通过复用预训练模型而无需微调或使用多GPU训练，支持低资源实验。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。