QUICK REVIEW

[论文解读] Structural Disentanglement in Bilinear MLPs via Architectural Inductive Bias

Ojasva Nema, Kaustubh Sharma|arXiv (Cornell University)|Feb 5, 2026

Model Reduction and Neural Networks被引用 0

一句话总结

论文主张架构性归纳偏置，特别是带乘性交互的双线性多层感知器（MLP），有助于学习算子结构的解耦。这带来模型编辑（取消学习）和长时间外推的改进，既有解析分析也有受控实验的验证。

ABSTRACT

Selective unlearning and long-horizon extrapolation remain fragile in modern neural networks, even when tasks have underlying algebraic structure. In this work, we argue that these failures arise not solely from optimization or unlearning algorithms, but from how models structure their internal representations during training. We explore if having explicit multiplicative interactions as an architectural inductive bias helps in structural disentanglement, through Bilinear MLPs. We show analytically that bilinear parameterizations possess a `non-mixing' property under gradient flow conditions, where functional components separate into orthogonal subspace representations. This provides a mathematical foundation for surgical model modification. We validate this hypothesis through a series of controlled experiments spanning modular arithmetic, cyclic reasoning, Lie group dynamics, and targeted unlearning benchmarks. Unlike pointwise nonlinear networks, multiplicative architectures are able to recover true operators aligned with the underlying algebraic structure. Our results suggest that model editability and generalization are constrained by representational structure, and that architectural inductive bias plays a central role in enabling reliable unlearning.

研究动机与目标

强调表示结构对泛化与在训练精度之外实现有选择的取消学习的重要性。
将结构化解耦描述为与任务结构对齐的学习算子的正交分解。
将双线性架构作为研究梯度动力学如何保持独立功能组件的视角。
提供解析结果和跨代数与动力系统任务的受控实验，以支持架构偏置的作用。

提出的方法

将任务特定的交互算子 Q 定义为来自双线性输出的秩-1 交互矩阵之和：Q = sum_k alpha_k w_k v_k^T.
在平方 Frobenius 损失 L = (1/2)||Q − Q*||_F^2 下分析梯度流并导出双线性流动方程：dQ/dt = −(Q − Q*) V V^T − U U^T (Q − Q*).
证明当 Q* 的 SVD 为 Q* = sum_i s_i u_i v_i^T 时，动力学解耦为独立模态更新 c_i(t) 而无交叉项。
认为双线性参数化保持交互模态的独立性，使得有选择的取消学习和稳定外推成为可能。
在模运算、循环推理、李群动力学和取消学习基准等任务上进行受控实验，以验证理论。
比较乘法架构（双线性 MLP、SwiGLU、GeGLU）与逐点非线性（ReLU、Tanh、Sigmoid）在内部结构和取消学习行为方面的差异。

Structural Disentanglement in Bilinear MLPs via Architectural Inductive Bias

实验结果

研究问题

RQ1架构性归纳偏置向乘法交互是否促进学习算子结构化解耦？
RQ2双线性架构是否能够在被纠缀任务中实现有选择的取消学习而不干扰保留分量？
RQ3双线性模型学习的内部算子结构如何影响长时间外推与动力系统中的不变量？
RQ4在模运算及相关任务中，双线性模型是否比逐点非线性模型更真实地还原真实代数算子？

主要发现

双线性架构在训练过程中引发功能组件的正交分离，使得可独立修改特定模态成为可能。
在被纠缀任务中，双线性模型能实现几乎理想的有选择取消学习，同时保留任务保持的成分远好于逐点模型。
光谱与低秩结构分析显示，双线性模型学习的算子与真实代数结构对齐，而基于 ReLU 的模型则倾向于记忆分布式的组件。
离散迭代（循环）与连续（李群）动力学研究表明，双线性模型在较长的时间尺度上更好地保持不变量和体积。
实证结果表明，双线性、SwiGLU 与 GeGLU 架构更接近真实加法算子，并呈现对不需要的分量的快速衰减，支持将结构化解耦作为模型可编辑性的机制。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。