QUICK REVIEW

[论文解读] Reasoning aligns language models to human cognition

Gonçalo Guiomar, Elia Torre|arXiv (Cornell University)|Feb 9, 2026

Embodied and Extended Cognition被引用 0

一句话总结

该论文提出一个主动概率推理任务以将采样与推理区分开来，并显示链式思维（chain-of-thought）推理主要提升推理质量，使大语言模型的决策策略与人类认知趋同，而采样仍然次优。

ABSTRACT

Do language models make decisions under uncertainty like humans do, and what role does chain-of-thought (CoT) reasoning play in the underlying decision process? We introduce an active probabilistic reasoning task that cleanly separates sampling (actively acquiring evidence) from inference (integrating evidence toward a decision). Benchmarking humans and a broad set of contemporary large language models against near-optimal reference policies reveals a consistent pattern: extended reasoning is the key determinant of strong performance, driving large gains in inference and producing belief trajectories that become strikingly human-like, while yielding only modest improvements in active sampling. To explain these differences, we fit a mechanistic model that captures systematic deviations from optimal behavior via four interpretable latent variables: memory, strategy, choice bias, and occlusion awareness. This model places humans and models in a shared low-dimensional cognitive space, reproduces behavioral signatures across agents, and shows how chain-of-thought shifts language models toward human-like regimes of evidence accumulation and belief-to-choice mapping, tightening alignment in inference while leaving a persistent gap in information acquisition.

研究动机与目标

Introduce an active probabilistic reasoning task that disentangles sampling and inference.
Evaluate humans and a wide range of LLMs on the task under identical instructions.
Develop a mechanistic model with four latent variables to explain behavior across humans and models.
Assess how chain-of-thought reasoning shifts LLMs toward human-like cognitive strategies.

提出的方法

Design an active probabilistic reasoning task with 4 buttons, one biased toward RED, with occlusions to manipulate available evidence.
Have humans and LLMs perform sampling rounds followed by a final MAP-based inference round.
Define an near-optimal reference agent using PPO for sampling and MAP for inference to benchmark performance.
Fit a mechanistic model with four latent variables (Memory beta, Strategy kappa, Choice Bias omega, Occlusion Awareness theta) to explain sampling and inference behavior.
Embed agents in a shared cognitive space using beta and kappa_f to compare human and model computations.

Figure 1: From task performance to latent cognitive variables. A: Task. We introduce an active probabilistic reasoning task in which agents sequentially sample from up to four buttons (A–D), each revealing a binary outcome (RED/GREEN). One button is biased toward RED, while the others are unbiased.

实验结果

研究问题

RQ1Do language models make decisions under uncertainty in a human-like way, and what role does chain-of-thought reasoning play?
RQ2How do sampling and inference contribute to performance, and does CoT mainly improve one over the other?
RQ3Can a mechanistic, latent-variable model align human and LLM decision strategies?
RQ4To what extent does CoT reasoning move LLMs toward human-like inference and away from non-human sampling patterns?

主要发现

Extended reasoning substantially improves inference quality, often more than sampling quality.
Inference gains from CoT align LLMs closer to human-like strategies in a shared cognitive space.
Some reasoning models match or exceed human inference quality but still underperform human sampling.
A four-parameter latent space (Memory, Strategy, Choice Bias, Occlusion Awareness) captures deviations from optimal Bayesian behavior across humans and LLMs.
CoT reasoning shifts LLMs toward near-optimal memory updates and MAP-like final decisions, yet sampling remains sub-optimal compared to skilled humans.

Figure 2: Comparing human and LLM behavior. A: Task performance. Average success rate across trial lengths $N\in\{2,\dots,15\}$ . We report human performance (green), split into lower $75\%$ and top $25\%$ of participants and the near-optimal reference agent (PPO sampling + MAP inference) (light blu

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。