QUICK REVIEW

[论文解读] On Emergences of Non-Classical Statistical Characteristics in Classical Neural Networks

Hanyu Zhao, Yang Wu|arXiv (Cornell University)|Feb 27, 2026

Advanced Thermodynamics and Statistical Mechanics被引用 0

一句话总结

本文提出 NCnet，这是一种在多任务学习中通过梯度竞争产生非经典 CHSH 统计的经典神经网络，在特定容量范围内 S 有时超过经典界，并与泛化相关。

ABSTRACT

Inspired by measurement incompatibility and Bell-family inequalities in quantum mechanics, we propose the Non-Classical Network (NCnet), a simple classical neural architecture that stably exhibits non-classical statistical behaviors under typical and interpretable experimental setups. We find non-classicality, measured by the $S$ statistic of CHSH inequality, arises from gradient competitions of hidden-layer neurons shared by multi-tasks. Remarkably, even without physical links supporting explicit communication, one task head can implicitly sense the training task of other task heads via local loss oscillations, leading to non-local correlations in their training outcomes. Specifically, in the low-resource regime, the value of $S$ increases gradually with increasing resources and approaches toward its classical upper-bound 2, which implies that underfitting is alleviated with resources increase. As the model nears the critical scale required for adequate performance, $S$ may temporarily exceed 2. As resources continue to grow, $S$ then asymptotically decays down to and fluctuates around 2. Empirically, when model capacity is insufficient, $S$ is positively correlated with generalization performance, and the regime where $S$ first approaches $2$ often corresponding to good generalization. Overall, our results suggest that non-classical statistics can provide a novel perspective for understanding internal interactions and training dynamics of deep networks.

研究动机与目标

以测量不相容性视角来分析神经网络内部交互的动机与意义。
提出一个简单的经典架构（NCnet），在多任务设置中能够呈现非经典统计行为。
使用 CHSH 统计量量化非经典相关性，并研究其对模型容量和训练动态的依赖。
提供机制性洞察，说明共享表示上的梯度竞争如何驱动非局部相关性。
探讨基于 CHSH 的诊断在理解现实世界模型的表示能力与泛化中的相关性。

提出的方法

将 NCnet 定义为具有两任务特定头部的共享隐藏层结构， reflecting 多任务设置。
在 Alice 与 Bob 两边为任务建立映射至 CHSH 的 A_i、B_j 输出，并计算 C(A_i,B_j)。
计算 CHSH 统计量 S = C(A1,B1) + C(A1,B2) + C(A2,B1) - C(A2,B2)，并与经典界 2 与 Tsirelson 界 ~2.828 进行比较。
在受控的 XORnet 启发设置中，研究隐藏层大小 n（n=2,3,4）对 S 的变化。
推广到现实世界架构（多语言 BERT 及带 LoRA 的 BERT）和多任务数据集，以在实际中验证非经典行为。

实验结果

研究问题

RQ1经典神经网络是否可以在 CHSH 测试中展现类似 Bell-type 违规的非经典统计相关性？
RQ2任务结构与共享表示在梯度竞争下如何促成 CHSH 违规？
RQ3模型容量（隐藏单元数或 LoRA 维度）如何影响 CHSH 统计量和训练动态？
RQ4非经典性是否与多任务学习中的泛化性能相关，并且在现实世界模型中是否仍然存在？
RQ5CHSH 基于诊断能否作为分析神经网络内部耦合与容量的辅助手段？

主要发现

在 NCnet 的某些隐藏单元数下，S 可以超过经典界 2，表明存在非经典相关性。
CHSH 统计量 S 在临界容量附近（如 NCnet 设置中的 n=3）达到峰值，随后容量进一步增大时回落至 2。
非经典性由参数共享导致的梯度竞争驱动，而非显式通信通道。
在带 LoRA 的现实世界式实验中，S 随着容量在多语言训练中提升，但在混合任务下是否超过 2 取决于任务难度与平衡。
泛化与 S 在低至中等容量区间正相关，与接近最优容量时 S 接近 2 的情况一致。
S 大于 2 是该情形下非经典性的充分条件，暗示 CHSH 作为神经网络诊断工具的更广泛适用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。