QUICK REVIEW

[论文解读] STATe-of-Thoughts: Structured Action Templates for Tree-of-Thoughts

Zachary Bamberger, Till R. Saenger|arXiv (Cornell University)|Feb 15, 2026

Topic Modeling被引用 0

一句话总结

STATe 用离散、可解释的行动模板取代基于温度的多样性， guiding Tree-of-Thoughts 推理，获得更高的多样性与可解释的行动-质量关联。

ABSTRACT

Inference-Time-Compute (ITC) methods like Best-of-N and Tree-of-Thoughts are meant to produce output candidates that are both high-quality and diverse, but their use of high-temperature sampling often fails to achieve meaningful output diversity. Moreover, existing ITC methods offer limited control over how to perform reasoning, which in turn limits their explainability. We present STATe-of-Thoughts (STATe), an interpretable ITC method that searches over high-level reasoning patterns. STATe replaces stochastic sampling with discrete and interpretable textual interventions: a controller selects actions encoding high-level reasoning choices, a generator produces reasoning steps conditioned on those choices, and an evaluator scores candidates to guide search. This structured approach yields three main advantages. First, action-guided textual interventions produce greater response diversity than temperature-based sampling. Second, in a case study on argument generation, STATe's explicit action sequences capture interpretable features that are highly predictive of output quality. Third, estimating the association between performance and action choices allows us to identify promising yet unexplored regions of the action space and steer generation directly toward them. Together, these results establish STATe as a practical framework for generating high-quality, diverse, and interpretable text. Our framework is available at https://github.com/zbambergerNLP/state-of-thoughts.

研究动机与目标

Motivate controllable, interpretable guidance over LLM reasoning for tasks requiring structured, diverse text generation.
Introduce an inference-time compute framework that replaces stochastic sampling with discrete action interventions.
Enable tracking of action trajectories to identify associations between reasoning patterns and output quality.
Demonstrate that action-guided generation improves diversity over high-temperature sampling in ITC settings.

提出的方法

Define a fixed set of discrete action templates that encode high-level reasoning choices (structure and content dimensions).
Use a Plan→Generate→Evaluate→Select loop where the Controller selects actions, the Generator produces reasoning steps conditioned on actions, and the Evaluator scores candidates to guide beam search.
Support two evaluator types: verifiable rewards and LLM-as-a-Judge to score intermediate and final states.
Implement early stopping via a FINISH action to avoid overthinking.
Track action traces along trajectories to enable association analyses between actions and performance (presence and sequential models).
Provide synthesis modes (Strict, Faithful, Restructured, Conclusion) to balance attribution with output quality.

实验结果

研究问题

RQ1Can discrete, action-guided interventions produce greater diversity than temperature-based sampling in ITC methods?
RQ2Are action sequences predictive of output quality in argument generation and other tasks?
RQ3Can modeling associations between actions and outcomes guide generation toward promising regions of the action space?

主要发现

Method	D (T=0.5)	U (T=0.5)	D (T=0.7)	U (T=0.7)	D (T=1.0)	U (T=1.0)
I/O	1.48 ± 0.03	2.28 ± 0.03	1.67 ± 0.04	2.45 ± 0.05	2.01 ± 0.05	2.64 ± 0.04
CoT	2.15 ± 0.07	2.65 ± 0.07	2.44 ± 0.07	2.85 ± 0.07	2.80 ± 0.08	3.07 ± 0.06
I/O w/ Action Space	1.74 ± 0.05	2.23 ± 0.05	2.01 ± 0.05	2.41 ± 0.07	2.44 ± 0.07	2.71 ± 0.04
CoT w/ Action Space	3.07 ± 0.11	3.15 ± 0.06	3.36 ± 0.12	3.28 ± 0.10	3.74 ± 0.09	3.49 ± 0.11
ToT	2.51 ± 0.09	2.87 ± 0.05	2.81 ± 0.08	3.07 ± 0.06	3.16 ± 0.12	3.32 ± 0.08
ToT w/ Action Space	3.16 ± 0.08	3.29 ± 0.05	3.25 ± 0.08	3.36 ± 0.07	3.61 ± 0.10	3.61 ± 0.11
STATe of Thoughts	4.87 ± 0.10	3.77 ± 0.12	5.02 ± 0.09	3.83 ± 0.11	5.39 ± 0.11	4.01 ± 0.11

STATe achieves higher diversity than baselines across model families and temperatures in NoveltyBench.
For Qwen3-30B-A3B-Instruct at T=0.7, STATe yields 5.02 mean distinct outputs vs 3.36 for ToT with Action Space and 2.44 for standard CoT.
Action sequences are highly predictive of argument quality in a case study on single-use plastic ban arguments.
Model-guided trajectory selection enables focusing on promising, less-explored regions of the action space.
Action-guided interventions produce greater diversity and maintain or improve output quality compared to high-temperature sampling.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。