Skip to main content
QUICK REVIEW

[論文レビュー] Edit Everything: A Text-Guided Generative System for Images Editing

Defeng Xie, Ruichen Wang|arXiv (Cornell University)|Apr 27, 2023
Video Analysis and Summarization被引用数 7
ひとこと要約

tldr: Edit Everything fuses Segment Anything, CLIP, and Stable Diffusion to edit images via text prompts, with a focus on Chinese prompts and iterative editing for complex tasks; code is publicly available.

ABSTRACT

We introduce a new generative system called Edit Everything, which can take image and text inputs and produce image outputs. Edit Everything allows users to edit images using simple text instructions. Our system designs prompts to guide the visual module in generating requested images. Experiments demonstrate that Edit Everything facilitates the implementation of the visual aspects of Stable Diffusion with the use of Segment Anything model and CLIP. Our system is publicly available at https://github.com/DefengXie/Edit_Everything.

研究の動機と目的

  • Enable text-driven editing of images through a modular pipeline combining segmentation, ranking, and diffusion-based generation.
  • Leverage SAM to segment the image and CLIP to select the target segment based on the source prompt.
  • Guide Stable Diffusion with a target prompt to replace the selected segment, achieving high realism.
  • Enhance Chinese-language capabilities by training CLIP and Stable Diffusion on Chinese corpora.
  • Provide an iterative, stepwise editing approach for complex prompts and objects.

提案手法

  • Segment the input image with Segment Anything Model (SAM).
  • Rank segments using a CLIP model based on a given source prompt and select the highest-scoring target segment.
  • Generate the replacement object with Stable Diffusion guided by a target prompt.

実験結果

リサーチクエスチョン

  • RQ1Can text prompts guide diffusion models to edit specific segments of an image via segmentation-based targeting?
  • RQ2How well does a sequential, iterative editing process handle complex prompts and multi-object edits?
  • RQ3What are the benefits of training Chinese CLIP and Chinese Stable Diffusion for Chinese-language prompts and scenarios?

主な発見

  • Edit Everything can edit any object in an image and adapt to different illustration styles with high realism.
  • The system supports iterative, step-by-step replacement to achieve complex prompt compliance.
  • Trained Chinese models outperform open-source alternatives in Chinese-language scenarios.
  • Zero-shot generation is possible, with iterative refinement improving alignment to complex prompts.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。