QUICK REVIEW

[논문 리뷰] Thinking in Latents: Adaptive Anchor Refinement for Implicit Reasoning in LLMs

Disha Sheshanarayana, Rajat Subhra Pal|arXiv (Cornell University)|2026. 03. 16.

Topic Modeling인용 수 0

한 줄 요약

AdaAnchor는 소수의 앵커 벡터를 반복적으로 업데이트하고 적응적 정지 규칙을 사용하여 앵커 다이나믹이 수렴할 때 멈춤으로써 고정 latent 단계의 토큰-수준 추론보다 출력 토큰 수를 줄이고 효율-정확도 트레이드오프를 개선합니다.

ABSTRACT

Token-level Chain-of-Thought (CoT) prompting has become a standard way to elicit multi-step reasoning in large language models (LLMs), especially for mathematical word problems. However, generating long intermediate traces increases output length and inference cost, and can be inefficient when the model could arrive at the correct answer without extensive verbalization. This has motivated latent-space reasoning approaches that shift computation into hidden representations and only emit a final answer. Yet, many latent reasoning methods depend on a fixed number of latent refinement steps at inference, adding another hyperparameter that must be tuned across models and datasets to balance accuracy and efficiency. We introduce AdaAnchor, a latent reasoning framework that performs silent iterative computation by refining a set of latent anchor vectors attached to the input. AdaAnchor further incorporates an adaptive halting mechanism that monitors anchor stability across iterations and terminates refinement once the anchor dynamics converge, allocating fewer steps to easier instances while reserving additional refinement steps for harder ones under a shared maximum-step budget. Our empirical evaluation across three mathematical word-problem benchmarks shows that AdaAnchor with adaptive halting yields accuracy gains of up to 5% over fixed-step latent refinement while reducing average latent refinement steps by 48-60% under the same maximum-step budget. Compared to standard reasoning baselines, AdaAnchor achieves large reductions in generated tokens (92-93%) by moving computation into silent latent refinement, offering a different accuracy-efficiency trade-off with substantially lower output-token usage.

연구 동기 및 목표

계산을 잠재 공간으로 이동시켜 토큰-수준 추론 비용을 감소시키려는 동기 부여.
추론 중에 소형 앵커 벡터 집합을 정제하는 잠재 추론 프레임워크인 AdaAnchor를 도입합니다.
앵커 안정성에 기반한 적응형 정지 메커니즘을 개발하여 정제를 종료합니다.
고정 단계 잠재 방법과 명시적 Chain-of-Thought 기준과 비교하기 위해 수학 단어 문제에서 AdaAnchor를 평가합니다.

제안 방법

임베딩 시퀀스 앞에 학습 가능한 m개의 앵커 벡터를 추가합니다.
β 스무딩 매개변수를 사용하여 기본 네트워크의 은닉 상태에서 앵커 슬롯을 업데이트하며 순차적으로 앵커를 정제합니다.
여러 차례의 반복에서 앵커 다이나믹의 수렴을 감지하는 안정성 기반의 정지 규칙으로 정제를 종료합니다.
정제가 종료된 후 최종 답변만 해석-만 하는 형식으로 디코딩합니다.
공유 최대 latent 예산 Kmax 하에서 적응적 정지가 고정 단계 latent 정제와 비교합니다.

Figure 1: Comparison of AdaAnchor with explicit Chain-of-Thought (CoT) reasoning. CoT generates long intermediate reasoning tokens, whereas AdaAnchor performs implicit multi-step computation by refining latent anchor vectors and uses stability-based early stopping before answer-only decoding.

실험 결과

연구 질문

RQ1앵커 정제를 통해 모든 토론 없이도 다단계 암시적 추론이 가능합니까?
RQ2앵커 안정성에 기반한 적응형 정지가 고정 계산 예산 하에서 효율-정확도 트레이드오프를 개선합니까?
RQ3AdaAnchor는 토큰 기반 및 고정 latent 방식과 비교해 표준 수학 단어 문제 벤치마크에서 어떻게 성능합니까?

주요 결과

데이터셋	모델	방법	정확도	평균 토큰	평균 스텝
GSM8K	Qwen2.5-1.5B	No CoT	13.0	2.16	–
GSM8K	Qwen2.5-1.5B	CoT	20.0	28.27	–
GSM8K	Qwen2.5-1.5B	iCoT	12.23	2.36	–
GSM8K	AdaAnchor (K=8)	AdaAnchor (K=8)	16.0	2.73	8
SVAMP	AdaAnchor (K=8)	AdaAnchor (K=8)	50.5	2.12	8
MultiArith	AdaAnchor (K=8)	AdaAnchor (K=8)	27.6	2.34	8
GSM8K	Llama-3.2-1B	No CoT	10.5	2.98	–
SVAMP	Llama-3.2-1B	No CoT	38.2	2.10	–
MultiArith	Llama-3.2-1B	No CoT	20.56	2.08	–
GSM8K	Llama-3.2-1B	CoT	23.2	25.4	–
SVAMP	Llama-3.2-1B	CoT	57.8	28.21	–
MultiArith	Llama-3.2-1B	CoT	43.33	28.0	–
GSM8K	Llama-3.2-1B	iCoT	11.7	2.25	–
SVAMP	Llama-3.2-1B	iCoT	54.2	2.43	–
MultiArith	Llama-3.2-1B	iCoT	30.84	2.12	–
GSM8K	Llama-3.2-1B	AdaAnchor (K=8)	14.0	2.89	8
SVAMP	Llama-3.2-1B	AdaAnchor (K=8)	52.0	2.13	8
MultiArith	Llama-3.2-1B	AdaAnchor (K=8)	28.31	2.48	8
GSM8K	Llama-3.2-1B	AdaAnchor adaptive	17.2	2.45	3.5
SVAMP	Llama-3.2-1B	AdaAnchor adaptive	53.4	2.8	3.1
MultiArith	Llama-3.2-1B	AdaAnchor adaptive	32.44	2.57	3.5

적응형 정지를 가진 AdaAnchor는 동일 예산 하에서 고정 단계 latent 정제에 비해 정확도 이득이 최대 5%까지 나타납니다.
적응형 정지는 고정 단계 정제에 비해 평균 48–60%의 평균 잠재 정제 단계를 감소시킵니다.
잠재 공간에서의 계산으로 인해 출력 토큰 사용이 크게 감소하여 토큰 기반 추론 기준 대비 92–93%의 감소를 달성합니다.
No-CoT 및 명시적 CoT 기준선과 비교할 때 AdaAnchor는 GSM8K, SVAMP, MultiArith에서 더 나은 효율을 달성하면서 정확도를 유지하거나 향상시킵니다.
고정 단계 예산은 수익이 점차 감소하는 경향이 있어 적응적 종료 전략을 촉진합니다.

Figure 2: Overview of AdaAnchor. AdaAnchor prepends $m$ learnable latent anchor vectors to the input embedding sequence (left), iteratively refines them via repeated forward passes and anchor-slot updates (middle), and uses a stability-based criterion to halt early before performing answer-only deco

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.