QUICK REVIEW

[논문 리뷰] ReVersion: Diffusion-Based Relation Inversion from Images

Ziqi Huang, Tianxing Wu|arXiv (Cornell University)|2023. 03. 23.

Multimodal Machine Learning Applications인용 수 8

한 줄 요약

ReVersion은 전치사 선제(prior)와 관계 중심 샘플링으로 안내된 확산 모델 역전에서 예시 이미지를 이용해 관계 프롬프트를 학습하고, 추출된 관계로 객체가 상호 작용하는 새로운 장면을 생성한다.

ABSTRACT

Diffusion models gain increasing popularity for their generative capabilities. Recently, there have been surging needs to generate customized images by inverting diffusion models from exemplar images, and existing inversion methods mainly focus on capturing object appearances (i.e., the "look"). However, how to invert object relations, another important pillar in the visual world, remains unexplored. In this work, we propose the Relation Inversion task, which aims to learn a specific relation (represented as "relation prompt") from exemplar images. Specifically, we learn a relation prompt with a frozen pre-trained text-to-image diffusion model. The learned relation prompt can then be applied to generate relation-specific images with new objects, backgrounds, and styles. To tackle the Relation Inversion task, we propose the ReVersion Framework. Specifically, we propose a novel "relation-steering contrastive learning" scheme to steer the relation prompt towards relation-dense regions, and disentangle it away from object appearances. We further devise "relation-focal importance sampling" to emphasize high-level interactions over low-level appearances (e.g., texture, color). To comprehensively evaluate this new task, we contribute the ReVersion Benchmark, which provides various exemplar images with diverse relations. Extensive experiments validate the superiority of our approach over existing methods across a wide range of visual relations. Our proposed task and method could be good inspirations for future research in various domains like generative inversion, few-shot learning, and visual relation detection.

연구 동기 및 목표

공통 관계가 존재하는 예시 이미지들 간의 새로운 문제인 관계 역전(Relation Inversion)을 연구한다.
frozen pre-trained diffusion 모델의 텍스트 임베딩 공간에서 관계 프롬프트를 학습한다.
객체 외관으로부터 관계 프롬프트를 분리하여 관계 기반의 유연한 이미지를 합성할 수 있도록 한다.
관계 역전을 평가하기 위한 ReVersion 벤치마크를 제안한다.

제안 방법

텍스트 임베딩에서 관계를 밀집한 서브공간으로 guiding하는 전치사(pronoun) 프리어를 도입한다.
관계 프롬프트를 기저의 전치사로 끌어당기고 비전치적 단어에서 멀어지게 하는 관계-조정 대비 학습(span of contrastive learning) 스킴을 개발한다.
외관 누출을 방지하기 위해 예시 객체 설명을 포함한 개선된 음수를 사용한다.
denoising을 위한 더 높은 노이즈 수준으로 확산 타임스텝을 왜곡하여 고수준 상호 작용에 중점을 두는 관계-초점 중요도 샘플링을 적용한다.
관계 프롬프트를 steer 손실과 노이즈에 강건한 denoising 손실을 결합한 공동 목표로 최적화한다.

실험 결과

연구 질문

RQ1공통 관계를 공유하는 예시 이미지들로부터 학습된 관계 프롬프트를 사용해 새로운 객체를 가진 새로운 장면을 생성할 수 있는가?
RQ2전치사 기반 선호(prior)와 대비 학습 조정이 외관으로부터 구분되면서 고수준 관계의 추출을 개선하는가?
RQ3관계-초점 중요도 샘플링이 확산 기반 역전에서 고수준 상호 작용에 대한 집중을 높이는가?
RQ4학습된 관계 프롬프트가 새로운 개체와 배경에 얼마나 잘 일반화되는가?
RQ5제안된 ReVersion 구성 요소가 생성된 이미지의 관계 및 개체 정확도에 미치는 영향은 무엇인가?

주요 결과

프레임워크는 추출된 관계를 통해 엔티티가 상호 작용하는 새로운 장면 생성을 가능하게 하는 관계 프롬프트를 학습한다.
전치사 선호(prior)와 관계-조정은 외관으로부터 관계의 해방을 개선하여 예시 엔티티로부터의 누출을 감소시킨다.
관계-초점 중요도 샘플링은 최적화를 고수준 상호 작용으로 편향시켜 관계 정확도와 엔티티 진정성을 향상시킨다.
정성적 및 정량적 평가에서 관계 역전 과제에서 기저의 텍스트-투-이미지 생성 및 텍스트적 역전보다 우수한 성능이 나타난다.
전용 ReVersion 벤치는 관계 역전 과제를 평가하기 위해 다양한 예시 이미지와 템플릿을 제공한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.