QUICK REVIEW

[논문 리뷰] STATe-of-Thoughts: Structured Action Templates for Tree-of-Thoughts

Zachary Bamberger, Till R. Saenger|arXiv (Cornell University)|2026. 02. 15.

Topic Modeling인용 수 0

한 줄 요약

STATe가 온도 기반 다양성을 분리되고 해석 가능한 행동 템플릿으로 대체하여 Tree-of-Thoughts 추론을 안내하고, 더 높은 다양성과 해석 가능한 행동-품질 연관성을 얻는다.

ABSTRACT

Inference-Time-Compute (ITC) methods like Best-of-N and Tree-of-Thoughts are meant to produce output candidates that are both high-quality and diverse, but their use of high-temperature sampling often fails to achieve meaningful output diversity. Moreover, existing ITC methods offer limited control over how to perform reasoning, which in turn limits their explainability. We present STATe-of-Thoughts (STATe), an interpretable ITC method that searches over high-level reasoning patterns. STATe replaces stochastic sampling with discrete and interpretable textual interventions: a controller selects actions encoding high-level reasoning choices, a generator produces reasoning steps conditioned on those choices, and an evaluator scores candidates to guide search. This structured approach yields three main advantages. First, action-guided textual interventions produce greater response diversity than temperature-based sampling. Second, in a case study on argument generation, STATe's explicit action sequences capture interpretable features that are highly predictive of output quality. Third, estimating the association between performance and action choices allows us to identify promising yet unexplored regions of the action space and steer generation directly toward them. Together, these results establish STATe as a practical framework for generating high-quality, diverse, and interpretable text. Our framework is available at https://github.com/zbambergerNLP/state-of-thoughts.

연구 동기 및 목표

구조화되고 다양한 텍스트 생성을 필요로 하는 작업에 대해 LLM 추론에 대한 제어 가능하고 해석 가능한 지침을 고무한다.
확률적 샘플링을 불연속적 행동 개입으로 대체하는 추론 시점 컴퓨트 프레임워크를 도입한다.
추론 패턴과 출력 품질 사이의 연관성을 식별하기 위해 행동 궤적 추적을 가능하게 한다.
행동-유도 생성이 ITC 설정에서 고온 샘픙링보다 다양성을 향상시킨다는 것을 보여준다.

제안 방법

구조와 내용 차원을 인코딩하는 고정된 이산적 행동 템플릿의 집합을 정의한다.
Plan→Generate→Evaluate→Select 루프를 사용하여 Controller가 행동을 선택하고, Generator가 행동에 조건화된 추론 단계를 생성하며, Evaluator가 후보를 점수화해 빔 검색을 안내한다.
중간 및 최종 상태를 평가하기 위해 검증 가능한 보상과 LLM-as-a-Judge 형의 두 가지 평가자 유형을 지원한다.
과도한 사고를 피하기 위해 FINISH 액션으로 조기 종료를 구현한다.
행동 궤적을 따라 추적하여 행동과 성능 간의 연관 분석(존재 및 순차 모델)을 가능하게 한다.
속성 부여와 출력 품질의 균형을 맞추기 위해 합성 모드(Strict, Faithful, Restructured, Conclusion)를 제공한다.

실험 결과

연구 질문

RQ1ITC 방법에서 이산적이고 행동 지향적인 개입이 온도 기반 샘플링보다 더 큰 다양성을 만들어낼 수 있는가?
RQ2행동 시퀀스가 주장 생성 및 기타 과제의 출력 품질을 예측할 수 있는가?
RQ3행동과 결과 간의 연관성을 모델링하여 생성이 행동 공간의 유망한 영역으로 가도록 안내할 수 있는가?

주요 결과

방법	D (T=0.5)	U (T=0.5)	D (T=0.7)	U (T=0.7)	D (T=1.0)	U (T=1.0)
I/O	1.48 ± 0.03	2.28 ± 0.03	1.67 ± 0.04	2.45 ± 0.05	2.01 ± 0.05	2.64 ± 0.04
CoT	2.15 ± 0.07	2.65 ± 0.07	2.44 ± 0.07	2.85 ± 0.07	2.80 ± 0.08	3.07 ± 0.06
I/O w/ Action Space	1.74 ± 0.05	2.23 ± 0.05	2.01 ± 0.05	2.41 ± 0.07	2.44 ± 0.07	2.71 ± 0.04
CoT w/ Action Space	3.07 ± 0.11	3.15 ± 0.06	3.36 ± 0.12	3.28 ± 0.10	3.74 ± 0.09	3.49 ± 0.11
ToT	2.51 ± 0.09	2.87 ± 0.05	2.81 ± 0.08	3.07 ± 0.06	3.16 ± 0.12	3.32 ± 0.08
ToT w/ Action Space	3.16 ± 0.08	3.29 ± 0.05	3.25 ± 0.08	3.36 ± 0.07	3.61 ± 0.10	3.61 ± 0.11
STATe of Thoughts	4.87 ± 0.10	3.77 ± 0.12	5.02 ± 0.09	3.83 ± 0.11	5.39 ± 0.11	4.01 ± 0.11

STATe는 NoveltyBench에서 모델 계열과 온도에 상관없이 더 높은 다양성을 달성한다.
Qwen3-30B-A3B-Instruct에서 T=0.7일 때, STATe는 평균 서로 다른 출력 5.02를 얻는 반면 Action Space가 있는 ToT는 3.36, 표준 CoT는 2.44이다.
일회용 플라스틱 금지 주장의 사례 연구에서 행동 시퀀스가 주장 품질을 매우 잘 예측한다.
모델 가이드 궤적 선택은 행동 공간의 유망하지만 덜 탐색된 영역에 집중하도록 한다.
행동 유도 개입은 고온 샘플링에 비해 더 큰 다양성을 형성하고 출력 품질을 유지하거나 향상시킨다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.