QUICK REVIEW

[논문 리뷰] InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models

Lichang Chen, Jiuhai Chen|arXiv (Cornell University)|2023. 06. 05.

Topic Modeling인용 수 11

한 줄 요약

InstructZero는 오픈 소스 LLM에 대한 저차원 소프트 프롬프트를 최적화하여 블랙박스 LLM에 대한 지시를 생성하고, API 모델을 역전파하지 않고 제로샷 작업 성능을 향상시키도록 베이지안 최적화를 유도한다.

ABSTRACT

Large language models~(LLMs) are instruction followers, but it can be challenging to find the best instruction for different situations, especially for black-box LLMs on which backpropagation is forbidden. Instead of directly optimizing the discrete instruction, we optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM. On each iteration of the proposed method, which we call InstructZero, a soft prompt is converted into an instruction using the open-source LLM, which is then submitted to the black-box LLM for zero-shot evaluation, and the performance is sent to Bayesian optimization to produce new soft prompts improving the zero-shot performance. We evaluate InstructZero on different combinations of open-source LLMs and APIs including Vicuna and ChatGPT. Our results show that InstructZero outperforms SOTA auto-instruction methods across a variety of downstream tasks. Our code and data are publicly available at https://github.com/Lichang-Chen/InstructZero.

연구 동기 및 목표

Automate instruction search to improve zero-shot performance for black-box LLMs.
Reduce combinatorial instruction optimization to low-dimensional continuous optimization.
Leverage in-context learning of open-source LLMs to generate task-specific instructions.
Align latent soft-prompt kernels with instruction similarities to enhance optimization.

제안 방법

Transform the discrete instruction search into continuous optimization by learning a soft prompt p for a open-source LLM that generates a task instruction v.
Apply a random projection to reduce the soft-prompt dimension from d' to d for tractable optimization.
Formulate the objective as a black-box function H(p) measuring zero-shot performance after applying v to the black-box LLM f, and optimize H(p) via Bayesian optimization.
Introduce an instruction-coupled kernel that aligns the latent-space prompt similarities with instruction similarities, ensuring BO explores instruction-relevant regions.
Use Gaussian Process priors and Expected Improvement as the BO framework to update posteriors and select next prompts.
Iterate until convergence to produce the best instruction v* for the target task.

실험 결과

연구 질문

RQ1How can instruction optimization be effectively performed for black-box LLMs without gradient access?
RQ2Can a soft prompt in a latent space, coupled with an open-source LLM, generate high-quality instructions for black-box models?
RQ3Does an instruction-coupled kernel improve Bayesian optimization efficiency by aligning latent and instruction spaces?
RQ4Is InstructZero able to outperform state-of-the-art auto-instruction methods across multiple tasks?
RQ5What is the impact of using smaller open-source models to optimize instructions for larger API LLMs?

주요 결과

InstructZero significantly outperforms baselines (APE and Uniform) on a broad set of tasks.
ChatGPT’s zero-shot performance improves when guided by InstructZero-generated instructions, achieving SOTA on 32/32 tasks from BIG-Bench in the reported setting.
The method can match or exceed results obtained with larger models by optimizing instructions via a smaller open-source LLM.
Ablation shows that optimizing the soft prompt yields substantial gains over manual prompts or using exemplars alone.
Visualization indicates progressive improvement of instructions and effective exploration-exploitation in the latent space across iterations.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.