[논문 리뷰] InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models
InstructZero는 오픈 소스 LLM에 대한 저차원 소프트 프롬프트를 최적화하여 블랙박스 LLM에 대한 지시를 생성하고, API 모델을 역전파하지 않고 제로샷 작업 성능을 향상시키도록 베이지안 최적화를 유도한다.
Large language models~(LLMs) are instruction followers, but it can be challenging to find the best instruction for different situations, especially for black-box LLMs on which backpropagation is forbidden. Instead of directly optimizing the discrete instruction, we optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM. On each iteration of the proposed method, which we call InstructZero, a soft prompt is converted into an instruction using the open-source LLM, which is then submitted to the black-box LLM for zero-shot evaluation, and the performance is sent to Bayesian optimization to produce new soft prompts improving the zero-shot performance. We evaluate InstructZero on different combinations of open-source LLMs and APIs including Vicuna and ChatGPT. Our results show that InstructZero outperforms SOTA auto-instruction methods across a variety of downstream tasks. Our code and data are publicly available at https://github.com/Lichang-Chen/InstructZero.
연구 동기 및 목표
- Automate instruction search to improve zero-shot performance for black-box LLMs.
- Reduce combinatorial instruction optimization to low-dimensional continuous optimization.
- Leverage in-context learning of open-source LLMs to generate task-specific instructions.
- Align latent soft-prompt kernels with instruction similarities to enhance optimization.
제안 방법
- Transform the discrete instruction search into continuous optimization by learning a soft prompt p for a open-source LLM that generates a task instruction v.
- Apply a random projection to reduce the soft-prompt dimension from d' to d for tractable optimization.
- Formulate the objective as a black-box function H(p) measuring zero-shot performance after applying v to the black-box LLM f, and optimize H(p) via Bayesian optimization.
- Introduce an instruction-coupled kernel that aligns the latent-space prompt similarities with instruction similarities, ensuring BO explores instruction-relevant regions.
- Use Gaussian Process priors and Expected Improvement as the BO framework to update posteriors and select next prompts.
- Iterate until convergence to produce the best instruction v* for the target task.
실험 결과
연구 질문
- RQ1How can instruction optimization be effectively performed for black-box LLMs without gradient access?
- RQ2Can a soft prompt in a latent space, coupled with an open-source LLM, generate high-quality instructions for black-box models?
- RQ3Does an instruction-coupled kernel improve Bayesian optimization efficiency by aligning latent and instruction spaces?
- RQ4Is InstructZero able to outperform state-of-the-art auto-instruction methods across multiple tasks?
- RQ5What is the impact of using smaller open-source models to optimize instructions for larger API LLMs?
주요 결과
- InstructZero significantly outperforms baselines (APE and Uniform) on a broad set of tasks.
- ChatGPT’s zero-shot performance improves when guided by InstructZero-generated instructions, achieving SOTA on 32/32 tasks from BIG-Bench in the reported setting.
- The method can match or exceed results obtained with larger models by optimizing instructions via a smaller open-source LLM.
- Ablation shows that optimizing the soft prompt yields substantial gains over manual prompts or using exemplars alone.
- Visualization indicates progressive improvement of instructions and effective exploration-exploitation in the latent space across iterations.
더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.