QUICK REVIEW

[论文解读] The Unreasonable Effectiveness of Text Embedding Interpolation for Continuous Image Steering

Yiğit Ekin, Yossi Gandelsman|arXiv (Cornell University)|Mar 18, 2026

Generative Adversarial Networks and Image Synthesis被引用 0

一句话总结

一个训练无关的框架，通过引导文本条件生成模型的文本嵌入空间实现连续、可控的图像编辑，使用由大语言模型驱动的管线构建去偏见的对比提示，并通过弹性范围搜索实现平滑编辑。

ABSTRACT

We present a training-free framework for continuous and controllable image editing at test time for text-conditioned generative models. In contrast to prior approaches that rely on additional training or manual user intervention, we find that a simple steering in the text-embedding space is sufficient to produce smooth edit control. Given a target concept (e.g., enhancing photorealism or changing facial expression), we use a large language model to automatically construct a small set of debiased contrastive prompt pairs, from which we compute a steering vector in the generator's text-encoder space. We then add this vector directly to the input prompt representation to control generation along the desired semantic axis. To obtain a continuous control, we propose an elastic range search procedure that automatically identifies an effective interval of steering magnitudes, avoiding both under-steering (no-edit) and over-steering (changing other attributes). Adding the scaled versions of the same vector within this interval yields smooth and continuous edits. Since our method modifies only textual representations, it naturally generalizes across text-conditioned modalities, including image and video generation. To quantify the steering continuity, we introduce a new evaluation metric that measures the uniformity of semantic change across edit strengths. We compare the continuous editing behavior across methods and find that, despite its simplicity and lightweight design, our approach is comparable to training-based alternatives, outperforming other training-free methods.

研究动机与目标

为细粒度图像编辑提供轻量、即插即用的方案，无需重新训练或额外模块。
通过在文本编码器表示中的简单线性干预实现持续可控性。
利用LLM自动化对比提示构建与标记选择，确保语义聚焦的编辑。
开发一种自适应、数据驱动的方法，识别实现平滑编辑的有效推断强度区间。
提出一种新指标，以评估跨编辑强度的语义变化连续性。

提出的方法

通过LLM识别的标记并对去偏对比提示对进行均值差分，聚合后在文本嵌入空间计算推导方向以获得导向向量。
将推导向量添加到文本编码器输入表示中，以使生成沿所需语义轴进行（公式2）。
使用弹性范围搜索自动识别有效的推理强度区间，然后在该区间内对向量进行放缩版本以实现连续编辑（第3.3节的算法与描述）。
利用LLM自动化标记选择，以引导与概念相关的标记，并将编辑归类为局部、全局或风格化（第3.2节）。
通过风格标记池化实现去偏，以从缠绕的线索中隔离目标属性（第3.1.3节）。
引入新的连续性指标（MID 距离）以量化跨编辑强度的无缝语义变化（第4.3节）。

Figure 1: Our framework. Given a user text prompt, our method enables controllable editing in text-to-image generation without retraining. (a) In the default setting, the prompt is encoded by the text encoder and used by the generative pipeline to produce an image. (b) To introduce edit control, we

实验结果

研究问题

RQ1是否仅通过引导文本编码器表示、且不进行训练或结构性改动就能实现连续且可解释的图像编辑？
RQ2自动化的LLM引导的对比提示生成与标记选择管线，是否能在多种概念中实现鲁棒且语义集中的编辑？
RQ3弹性范围搜索是否能提供平滑、感知一致的编辑，并在不同骨干模型上避免过度或不足编辑？
RQ4与基于训练的方法相比，文本嵌入空间的 steering 在编辑强度、内容保持和滑块连续性方面表现如何？
RQ5该方法是否可迁移到不同的文本条件生成器，包括图像和视频模态？

主要发现

所提出的文本嵌入空间 steering 框架在较强骨干上对比训练型控制器具有竞争力的可控性。
弹性范围搜索能够自动识别产生感知上平滑编辑的推理强度，避免欠编辑和过度编辑的伪影。
通过LLM引导的标记选择与基于风格标记池化的去偏，能实现概念特定、局部化的编辑，同时更好地保持原始内容。
该方法保持轻量级，并可泛化到包括视频生成在内的文本条件模态，因为它完全在文本编码器空间内运作。
相比无需训练的基线，该方法在编辑合规性和滑块行为的平滑性方面更强，在对更强骨干的基线上与训练方法具有竞争力的表现。

Figure 2: Illustration of bias inheritance in steering. When the age direction is computed from a biased dataset (e.g., predominantly old men), the resulting steering vector entangles gender with age. Consequently, age manipulations not only modify apparent age but also introduce unintended gender-r

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。