[论文解读] Prompt Optimization Via Diffusion Language Models
本论文提出一个扩散语言模型(DLM)框架,通过掩蔽去噪在交互轨迹的条件下迭代改进系统提示,以在对梯度不可用的冻结下游LLM时提升性能。
We propose a diffusion-based framework for prompt optimization that leverages Diffusion Language Models (DLMs) to iteratively refine system prompts through masked denoising. By conditioning on interaction traces, including user queries, model responses, and optional feedback, our method enables flexible, span-level prompt updates without requiring gradient access or modifying the downstream language model. Across diverse benchmarks (e.g., $τ$-bench, SST-2, SST-5), DLM-optimized prompts consistently improve the performance of a frozen target LLM (e.g., GPT-4o-mini). We further show that moderate diffusion step counts provide the best balance between refinement quality and stability. These results highlight diffusion-based prompt optimization as a general, model-agnostic, and scalable approach for enhancing LLM performance through iterative prompt refinement.
研究动机与目标
- Motivate and develop a diffusion-based approach for dynamic, feedback-driven prompt optimization.
- Enable span-level system-prompt updates without modifying the downstream model or accessing its gradients.
- Demonstrate the generality and scalability of DLM-based prompt refinement across diverse tasks.
- Analyze the impact of diffusion steps on prompt refinement quality and stability.
提出的方法
- Utilize Diffusion Language Models to mask and denoise targeted spans of the system prompt within an interaction trace.
- Condition the denoising process on the user query, model output, and optional feedback.
- Iteratively refine the masked system prompt for a fixed number of iterations without changing the target LLM.
- Evaluate DLM-based prompt optimization against autoregressive and gradient-based prompt methods across multiple benchmarks.
- Investigate the effect of varying diffusion step counts on refinement quality and stability.
实验结果
研究问题
- RQ1Can DLMs effectively optimize system prompts by iteratively masking and refining spans in response to interaction traces?
- RQ2Does diffusion-based prompt optimization improve performance of a frozen downstream LLM across diverse tasks?
- RQ3What is the optimal range of diffusion steps for balancing refinement quality and stability?
- RQ4How does DLM-based prompt optimization compare to autoregressive and gradient-based prompt editing methods?
- RQ5Is the approach generalizable across function-calling, sentiment analysis, semantic similarity, and NLI tasks?
主要发现
| Model | Tau-bench-airline | Tau-bench-retail | SST2 | SST5 | MRPC | SNLI |
|---|---|---|---|---|---|---|
| Dream-7B | 0.50 | 0.46 | 0.97 | 0.67 | 0.69 | 0.93 |
| Llama3-8B | 0.41 | 0.42 | 0.96 | 0.63 | 0.69 | 0.92 |
| Qwen3-8B | 0.42 | 0.46 | 0.96 | 0.65 | 0.69 | 0.92 |
| TextGrad | 0.50 | 0.45 | 0.97 | 0.67 | 0.70 | 0.93 |
| Baseline | 0.43 | 0.42 | 0.93 | 0.55 | 0.61 | 0.88 |
- DLM-based prompt optimization yields performance gains across all evaluated domains compared to baselines.
- Dream-7B achieved notable improvements on reasoning and structured-generation tasks (e.g., SST-5, MRPC, SNLI).
- Compared to AR prompt optimizers and TextGrad, DLM optimization shows competitive or superior gains without requiring model gradients.
- Performance on SST-5 rose from 0.55 to 0.67 with DLM prompts.
- Optimal diffusion steps around 64 offer a balance between refinement quality and stability, with diminishing returns beyond that point.
- Prompt updates can be performed by masking only portions of the system prompt, enabling span-level edits without altering the target model.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。