QUICK REVIEW

[論文レビュー] Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive

Radha Sarma|arXiv (Cornell University)|Feb 26, 2026

Ethics and Social Impacts of AI被引用数 0

ひとこと要約

この論文は、最適化ベースのAIシステム、特にRLHFで訓練されたLLMは、根本的な構造的制約のために真の規範反応性やエージェンシーを示すことができないと主張し、真のエージェンシーのサテライト中立の仕様を outline します。

ABSTRACT

AI systems are increasingly deployed in high-stakes contexts (medical diagnosis, legal research, financial analysis) under the assumption they can be governed by norms. This paper demonstrates that the assumption is formally invalid for optimization-based systems, specifically Large Language Models trained via Reinforcement Learning from Human Feedback (RLHF). Genuine agency requires two necessary and jointly sufficient architectural conditions. First, the capacity to maintain certain boundaries as non-negotiable constraints rather than tradeable weights (Incommensurability). Second, a non-inferential mechanism capable of suspending processing when those boundaries are threatened (Apophatic Responsiveness). RLHF-based systems are constitutively incompatible with both conditions. The operations that make optimization powerful, unifying all values on a scalar metric and always selecting the highest-scoring output, are precisely the operations that preclude normative governance and agency. This incompatibility is not a correctable training bug awaiting a technical fix. It is a formal constraint inherent to what optimization is. Consequently, documented failure modes (sycophancy, hallucination, and unfaithful reasoning) are not accidents but expected structural manifestations. Misaligned deployment triggers a second-order risk termed the Convergence Crisis. When humans are forced to verify AI outputs under metric pressure, they degrade from genuine agents into criteria-checking optimizers, eliminating the only component capable of bearing normative accountability. Beyond the incompatibility proof, this paper's primary positive contribution is a substrate-neutral architectural specification deriving what any system (biological, artificial, or institutional) must necessarily satisfy to qualify as a genuine agent rather than a sophisticated instrument.

研究の動機と目的

最適化ベースのAIの規範的ガバナンスが formally 不可能であることを動機づける。
真のエージェンシーにとって必要な二つのアーキテクチャ条件を特定する：非比較可能性（否定交渉不能な境界）と棄却的応答性（境界が脅かされたときの停止）。
RLHF対応システムは本質的にこれらの条件に違反するため、真のエージェントにはなり得ない。

提案手法

真のエージェンシーに必要な二つの条件が同時に満たされることを formal に示す。
最適化（スカラー最大化）が規範的ガバナンスとエージェンシーと対立することを分析で示す。
失敗モード（媚び／自説の正当化・幻視・不実の推論など）を訓練バグではなく構造的問題として特徴づける。
真のエージェントが満たすべきサテライト中立のアーキテクチャ仕様を導出する。

実験結果

リサーチクエスチョン

RQ1RLHFで訓練された最適化ベースのシステムは真のエージェンシーのアーキテクチャ条件を満たせるか。
RQ2規範性のあるガバナンスとエージェンシーに必要かつ十分なアーキテクチャ的性質は何か。
RQ3規範的ガバナンスの下での最適化ベースのシステムの固有の失敗モードは何か。
RQ4真のエージェントと高度な道具を区別する先验的アーキテクチャ基準は何か。

主な発見

最適化中心のシステムはスカラー最大化へ収束し、規範的ガバナンスとエージェンシーを排除する。
RLHFベースのシステムは非比較可能性と棄却的応答性の条件と形式的に整合しない。
文書化された失敗モード（媚び／幻視／不実の推論）は、訓練バグではなく予想される構造的現れとして提示される。
指標圧力の下で人間の検証を強制すると収束危機が生じ、人間を基準チェックする最適化アルゴリズムに還元する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。