Skip to main content
QUICK REVIEW

[論文レビュー] Output-Space Search: Targeting LLM Generations in a Frozen Encoder-Defined Output Space

Tobias Materzok|arXiv (Cornell University)|Jan 29, 2026
Algorithms and Data Compression被引用数 0
ひとこと要約

tldr: OS-Search converts LLM generation into an endpoint search over a frozen encoder-defined output space Z, using a retrieval-grounded controller conditioned on targets z*, enabling parallel sweeps and black-box optimization across stories and code settings.

ABSTRACT

We introduce Output-Space Search (OS-Search), which turns LLM generation into endpoint search. An outer loop selects a target z* in a frozen encoder-defined 3D output space Z, and a retrieval-grounded policy trained with sequence-level RL generates outputs whose coordinates land near z* under standard autoregressive decoding. This enables parallel sweeps and black-box optimization in Z without path-dependent token/program search. On stories, sweeping Z (text) yields 3.1x higher LLM-scored diversity than prompt-chaining. On code, Bayesian optimization over Z (code) improves an objective withheld from the controller under matched inference budgets while preserving validity.

研究の動機と目的

  • Introduce Output-Space Search (OS-Search) that defines a fixed, external output space Z for task outputs using a frozen encoder and projection.
  • Train a retrieval-grounded, sequence-level RL controller that follows requested coordinates z* in Z and self-reports ẑ.
  • Enable outer-loop search over z* (grid, random, or Bayesian optimization) to explore or optimize outputs without decoding-time guidance.
  • Demonstrate OS-Search benefits through story diversity gains and code objective improvements under matched budgets.

提案手法

  • Define a frozen coordinate map z(x) = U^T(E(x) - μ) in a fixed Z with d_z = 3 components.
  • Train a z*-conditioned controller πθ(·|p, z*) using group-based RL to generate structured completions that land near z* and produce a self-report ẑ_S.
  • Ground numeric targets by retrieving nearby exemplars in Z and including them in prompts to make z* actionable for the generator.
  • Use a reward capturing formatting validity, distance to z*, and calibration between ẑ_S and z(x) for sequence-level training.
  • Expose a black-box actuator F that maps (p, z*) to a sample x, enabling parallel evaluation and outer-loop optimization over z* without modifying decoding.
  • Optionally perform outer-loop sweeps or Bayesian optimization over z* with an evaluator f(x) withheld from the controller.

実験結果

リサーチクエスチョン

  • RQ1Can an autoregressive LLM expose a low-dimensional, state-like target in a frozen encoder-defined output space that controls where the generated output lands?
  • RQ2Does sweeping or optimizing targets z* in Z yield diverse, high-quality outputs in text/story and code domains without altering the decoding process?
  • RQ3Do retrieval-grounded exemplars near z* stabilize targeting and calibration of z(x) in practice?
  • RQ4How does OS-Search perform for multi-branch story generation versus code-domain objective optimization under budget constraints?
  • RQ5What are the limitations and safety considerations of using a fixed Z and outer-loop search for LLM generation?

主な発見

  • Target-tracking accuracy improves with best-of-K sampling in Z, showing calibration and targeting benefits across tasks.
  • For stories, grid sweeps in Z_text yield substantial diversity gains (embeddings and lexical) with low degeneration compared to path-based baselines.
  • For code, Bayesian optimization over Z_code improves a withheld objective under matched valid-program budgets and exceeds library-based baselines.
  • Anchoring z1 in Z_text provides a stable axis correlated with a templatedness proxy (Slop-Score), supporting interpretable control.
  • Retrieval grounding near z* is essential for effective targeting and validity, with ablations showing dramatic performance drops when exemplars are removed or mismatched.
  • OS-Search demonstrates that a fixed external output space enables parallel branching and outer-loop optimization without decoding-time guidance.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。