[論文レビュー] Output-Space Search: Targeting LLM Generations in a Frozen Encoder-Defined Output Space
tldr: OS-Search converts LLM generation into an endpoint search over a frozen encoder-defined output space Z, using a retrieval-grounded controller conditioned on targets z*, enabling parallel sweeps and black-box optimization across stories and code settings.
We introduce Output-Space Search (OS-Search), which turns LLM generation into endpoint search. An outer loop selects a target z* in a frozen encoder-defined 3D output space Z, and a retrieval-grounded policy trained with sequence-level RL generates outputs whose coordinates land near z* under standard autoregressive decoding. This enables parallel sweeps and black-box optimization in Z without path-dependent token/program search. On stories, sweeping Z (text) yields 3.1x higher LLM-scored diversity than prompt-chaining. On code, Bayesian optimization over Z (code) improves an objective withheld from the controller under matched inference budgets while preserving validity.
研究の動機と目的
- Introduce Output-Space Search (OS-Search) that defines a fixed, external output space Z for task outputs using a frozen encoder and projection.
- Train a retrieval-grounded, sequence-level RL controller that follows requested coordinates z* in Z and self-reports ẑ.
- Enable outer-loop search over z* (grid, random, or Bayesian optimization) to explore or optimize outputs without decoding-time guidance.
- Demonstrate OS-Search benefits through story diversity gains and code objective improvements under matched budgets.
提案手法
- Define a frozen coordinate map z(x) = U^T(E(x) - μ) in a fixed Z with d_z = 3 components.
- Train a z*-conditioned controller πθ(·|p, z*) using group-based RL to generate structured completions that land near z* and produce a self-report ẑ_S.
- Ground numeric targets by retrieving nearby exemplars in Z and including them in prompts to make z* actionable for the generator.
- Use a reward capturing formatting validity, distance to z*, and calibration between ẑ_S and z(x) for sequence-level training.
- Expose a black-box actuator F that maps (p, z*) to a sample x, enabling parallel evaluation and outer-loop optimization over z* without modifying decoding.
- Optionally perform outer-loop sweeps or Bayesian optimization over z* with an evaluator f(x) withheld from the controller.
実験結果
リサーチクエスチョン
- RQ1Can an autoregressive LLM expose a low-dimensional, state-like target in a frozen encoder-defined output space that controls where the generated output lands?
- RQ2Does sweeping or optimizing targets z* in Z yield diverse, high-quality outputs in text/story and code domains without altering the decoding process?
- RQ3Do retrieval-grounded exemplars near z* stabilize targeting and calibration of z(x) in practice?
- RQ4How does OS-Search perform for multi-branch story generation versus code-domain objective optimization under budget constraints?
- RQ5What are the limitations and safety considerations of using a fixed Z and outer-loop search for LLM generation?
主な発見
- Target-tracking accuracy improves with best-of-K sampling in Z, showing calibration and targeting benefits across tasks.
- For stories, grid sweeps in Z_text yield substantial diversity gains (embeddings and lexical) with low degeneration compared to path-based baselines.
- For code, Bayesian optimization over Z_code improves a withheld objective under matched valid-program budgets and exceeds library-based baselines.
- Anchoring z1 in Z_text provides a stable axis correlated with a templatedness proxy (Slop-Score), supporting interpretable control.
- Retrieval grounding near z* is essential for effective targeting and validity, with ablations showing dramatic performance drops when exemplars are removed or mismatched.
- OS-Search demonstrates that a fixed external output space enables parallel branching and outer-loop optimization without decoding-time guidance.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。