QUICK REVIEW

[논문 리뷰] Network Fusion for Content Creation with Conditional INNs.

Robin Rombach, Patrick Esser|arXiv (Cornell University)|2020. 05. 27.

Generative Adversarial Networks and Image Synthesis참고 문헌 35인용 수 3

한 줄 요약

이 논문은 사전에 훈련된 전문 모델(예: 텍스트용 BERT, 이미지용 BigGAN)을 재사용하여 새로운 콘텐츠 생성 작업(예: 텍스트에서 이미지 생성)을 수행할 수 있도록 하는 조건부 역행성 흐름(conditional invertible flows, INNs)을 사용한 네트워크 융합 방법을 제안한다. 이는 재훈련 없이도 가능하다. 한 전문 모델의 은닉 표현에 대해 다른 전문 모델의 은닉 표현 조건 하에 생성 모델을 학습시킴으로써, 다양한 모odal 간에 효율적이고 제어 가능하며 자원 소모가 적은 콘텐츠 합성을 가능하게 한다.

ABSTRACT

Artificial Intelligence for Content Creation has the potential to reduce the amount of manual content creation work significantly. While automation of laborious work is welcome, it is only useful if it allows users to control aspects of the creative process when desired. Furthermore, widespread adoption of semi-automatic content creation depends on low barriers regarding the expertise, computational budget and time required to obtain results and experiment with new techniques. With state-of-the-art approaches relying on task-specific models, multi-GPU setups and weeks of training time, we must find ways to reuse and recombine them to meet these requirements. Instead of designing and training methods for controllable content creation from scratch, we thus present a method to repurpose powerful, existing models for new tasks, even though they have never been designed for them. We formulate this problem as a translation between expert models, which includes common content creation scenarios, such as text-to-image and image-to-image translation, as a special case. As this translation is ambiguous, we learn a generative model of hidden representations of one expert conditioned on hidden representations of the other expert. Working on the level of hidden representations makes optimal use of the computational effort that went into the training of the expert model to produce these efficient, low-dimensional representations. Experiments demonstrate that our approach can translate from BERT, a state-of-the-art expert for text, to BigGAN, a state-of-the-art expert for images, to enable text-to-image generation, which neither of the experts can perform on its own. Additional experiments show the wide applicability of our approach across different conditional image synthesis tasks and improvements over existing methods for image modifications.

연구 동기 및 목표

낮은 계산 자원과 전문 지식의 장벽으로 인해 제어 가능하고 반자동적인 콘텐츠 생성을 가능하게 하기 위해.
특정 작업에 특화된 모델들이 새로운 콘텐츠 생성 작업에 쉽게 재사용되지 않는 한계를 해결하기 위해.
기존에 사전에 훈련된 전문 모델을 재사용함으로써, 처음부터 훈련하는 것 대비 훈련 시간과 자원 소모를 줄이기 위해.
사전에 훈련된 모델의 은닉 표현만을 사용하여 다양한 모달 간(예: 텍스트에서 이미지)의 번역을 가능하게 하기 위해.
기존 방법보다 유연성과 성능 면에서 뛰어난 조건부 이미지 합성에 일반화 가능한 프레임워크를 제공하기 위해.

제안 방법

사전에 훈련된 전문 모델의 은닉 표현 간 번역 작업으로 콘텐츠 생성을 공식화하기.
한 전문 모델의 은닉 표현의 생성 분포를 다른 전문 모델의 은닉 표현 조건 하에 조건부 역행성 흐름(INN)으로 모델링하기.
원천 전문 모델과 대상 전문 모델의 쌍으로 구성된 은닉 표현을 기반으로 INN을 훈련하기.
낮은 차원의 사전 계산된 은닉 표현에만 집중함으로써 기존 모델 계산의 재사용을 극대화하기.
훈련된 INN을 활용해 새로운 입력에 대해 제로샷 전이를 가능하게 하여 새로운 작업에 적용하기.
동일한 프레임워크를 다양한 전문 모델 쌍에 적용함으로써 다양한 조건부 이미지 합성 작업을 지원하기.

실험 결과

연구 질문

RQ1사전에 훈련된 특정 작업 전문 모델들이 재훈련이나 미세조정 없이 새로운 콘텐츠 생성 작업에 재사용될 수 있는가?
RQ2은닉 표현 간의 조건부 INN 기반 번역이 텍스트에서 이미지 생성과 같은 복수 모달 생성에 얼마나 효과적인가?
RQ3기존 접근 방식과 비교해 이 방법이 이미지 수정 및 조건부 합성 작업에서 경쟁력 있는 성능을 달성할 수 있는가?
RQ4이 프레임워크가 콘텐츠 생성에서 계산 자원과 전문 지식의 장벽을 어느 정도 줄이는가?
RQ5이 방법이 다양한 모델 아키텍처와 콘텐츠 생성 시나리오에 얼마나 일반화 가능한가?

주요 결과

BERT(텍스트 전문 모델)와 BigGAN(이미지 전문 모델)을 융합하여 텍스트에서 이미지 생성을 성공적으로 수행하였으며, 이는 각각의 모델이 단독으로는 수행할 수 없는 작업이다.
이 방법은 조건부 이미지 합성에서 경쟁적인 결과를 달성하였으며, 기존 방법 대비 더 뛰어난 유연성과 제어 가능성으로 성능을 뛰어넘었다.
실험 결과, 이 프레임워크는 텍스트에서 이미지 번역을 넘어서 다양한 조건부 이미지 합성 작업으로 일반화됨을 보였다.
은닉 표현을 사용함으로써 추가 훈련이 최소화되고 계산 비용이 감소하는 효율적인 추론이 가능했다.
미세조정이나 다중 GPU 훈련 없이도 사전에 훈련된 모델을 재사용함으로써 저자원 실험을 지원하였다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.