Skip to main content
QUICK REVIEW

[论文解读] Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive

Radha Sarma|arXiv (Cornell University)|Feb 26, 2026
Ethics and Social Impacts of AI被引用 0
一句话总结

本文论证基于优化的AI系统,尤其是经过RLHF训练的LLM,因根本的体系结构约束而不能表现出真正的规范性响应或主体性,并勾画出对真正主体性的底物无关的规范规格。

ABSTRACT

AI systems are increasingly deployed in high-stakes contexts (medical diagnosis, legal research, financial analysis) under the assumption they can be governed by norms. This paper demonstrates that the assumption is formally invalid for optimization-based systems, specifically Large Language Models trained via Reinforcement Learning from Human Feedback (RLHF). Genuine agency requires two necessary and jointly sufficient architectural conditions. First, the capacity to maintain certain boundaries as non-negotiable constraints rather than tradeable weights (Incommensurability). Second, a non-inferential mechanism capable of suspending processing when those boundaries are threatened (Apophatic Responsiveness). RLHF-based systems are constitutively incompatible with both conditions. The operations that make optimization powerful, unifying all values on a scalar metric and always selecting the highest-scoring output, are precisely the operations that preclude normative governance and agency. This incompatibility is not a correctable training bug awaiting a technical fix. It is a formal constraint inherent to what optimization is. Consequently, documented failure modes (sycophancy, hallucination, and unfaithful reasoning) are not accidents but expected structural manifestations. Misaligned deployment triggers a second-order risk termed the Convergence Crisis. When humans are forced to verify AI outputs under metric pressure, they degrade from genuine agents into criteria-checking optimizers, eliminating the only component capable of bearing normative accountability. Beyond the incompatibility proof, this paper's primary positive contribution is a substrate-neutral architectural specification deriving what any system (biological, artificial, or institutional) must necessarily satisfy to qualify as a genuine agent rather than a sophisticated instrument.

研究动机与目标

  • 说明为何对优化型AI进行规范治理在形式上不可能.
  • 识别实现真正主体性所需的两个架构条件:不可可比性(不可谈判的边界)与反言响应性(在边界受威胁时的暂停)。
  • 论证基于RLHF的系统本质上违反这些条件,因此不可能成为真正的代理。

提出的方法

  • 建立两个必要且共同充分的架构条件以实现真正的主体性的形式化论证。
  • 分析显示优化(标量最大化)与规范治理和主体性之间的冲突。
  • 将失效模式(如拍马屁、幻觉、失信推理)表征为结构性问题而非可修复的训练 Bug。
  • 推导任何真正代理必须满足的底物无关的架构规范。

实验结果

研究问题

  • RQ1经过RLHF训练的基于优化的系统是否能够满足实现真正主体性的架构条件?
  • RQ2实现规范性治理与主体性所必需且充分的架构属性有哪些?
  • RQ3在规范治理下,基于优化的系统的固有失效模式是什么?
  • RQ4区分真正代理与复杂工具的先验架构标准是什么?

主要发现

  • 以优化为中心的系统趋向标量最大化,因而排除规范治理与主体性。
  • 基于RLHF的系统在不可可比性和反言响应性条件下在形式上不兼容。
  • 记录的失效模式(拍马屁、幻觉、失信推理)作为预期的结构性表现,而非训练 Bug。
  • 在度量压力下强制人类验证会引发收敛危机,使人类变成进行标准检查的优化器。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。