[论文解读] InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models
InstructZero 对一个开源 LLM 进行低维度的软提示优化,以为一个黑盒 LLM 生成指令,指导贝叶斯优化在不通过 API 模型进行反向传播的情况下提升零-shot 任务性能。
Large language models~(LLMs) are instruction followers, but it can be challenging to find the best instruction for different situations, especially for black-box LLMs on which backpropagation is forbidden. Instead of directly optimizing the discrete instruction, we optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM. On each iteration of the proposed method, which we call InstructZero, a soft prompt is converted into an instruction using the open-source LLM, which is then submitted to the black-box LLM for zero-shot evaluation, and the performance is sent to Bayesian optimization to produce new soft prompts improving the zero-shot performance. We evaluate InstructZero on different combinations of open-source LLMs and APIs including Vicuna and ChatGPT. Our results show that InstructZero outperforms SOTA auto-instruction methods across a variety of downstream tasks. Our code and data are publicly available at https://github.com/Lichang-Chen/InstructZero.
研究动机与目标
- Automate instruction search to improve zero-shot performance for black-box LLMs.
- Reduce combinatorial instruction optimization to low-dimensional continuous optimization.
- Leverage in-context learning of open-source LLMs to generate task-specific instructions.
- Align latent soft-prompt kernels with instruction similarities to enhance optimization.
提出的方法
- Transform the discrete instruction search into continuous optimization by learning a soft prompt p for a open-source LLM that generates a task instruction v.
- Apply a random projection to reduce the soft-prompt dimension from d' to d for tractable optimization.
- Formulate the objective as a black-box function H(p) measuring zero-shot performance after applying v to the black-box LLM f, and optimize H(p) via Bayesian optimization.
- Introduce an instruction-coupled kernel that aligns the latent-space prompt similarities with instruction similarities, ensuring BO explores instruction-relevant regions.
- Use Gaussian Process priors and Expected Improvement as the BO framework to update posteriors and select next prompts.
- Iterate until convergence to produce the best instruction v* for the target task.
实验结果
研究问题
- RQ1How can instruction optimization be effectively performed for black-box LLMs without gradient access?
- RQ2Can a soft prompt in a latent space, coupled with an open-source LLM, generate high-quality instructions for black-box models?
- RQ3Does an instruction-coupled kernel improve Bayesian optimization efficiency by aligning latent and instruction spaces?
- RQ4Is InstructZero able to outperform state-of-the-art auto-instruction methods across multiple tasks?
- RQ5What is the impact of using smaller open-source models to optimize instructions for larger API LLMs?
主要发现
- InstructZero significantly outperforms baselines (APE and Uniform) on a broad set of tasks.
- ChatGPT’s zero-shot performance improves when guided by InstructZero-generated instructions, achieving SOTA on 32/32 tasks from BIG-Bench in the reported setting.
- The method can match or exceed results obtained with larger models by optimizing instructions via a smaller open-source LLM.
- Ablation shows that optimizing the soft prompt yields substantial gains over manual prompts or using exemplars alone.
- Visualization indicates progressive improvement of instructions and effective exploration-exploitation in the latent space across iterations.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。