QUICK REVIEW

[論文レビュー] From Embeddings to Dyson Series: Transformer Mechanics as Non-Hermitian Operator Theory

Po-Hao Chang|arXiv (Cornell University)|Mar 11, 2026

Model Reduction and Neural Networks被引用数 0

ひとこと要約

論文は Transformer の機構を演算子論の言語に再構成し、埋め込みを基底変換として、自己注意を非エルミートン相互作用として、深さを順序付き Dyson のような積として示す。

ABSTRACT

Transformer architectures are typically described in algorithmic and statistical terms, leaving their internal mechanics without a familiar structural language for researchers trained in physical theories. To bridge this gap, we develop a complementary operator-theoretic framework that recasts their mechanics in a language familiar to many-body physics. Beginning from the token as a discrete index without intrinsic geometry, we show that embedding corresponds to a basis transformation into a continuous representation space. Once such a reference basis is established, self-attention naturally assumes the role of a non-Hermitian interaction operator, and network depth implements an ordered composition of these interactions. Within this formulation, several empirical properties of deep Transformers -- including stability at large depth, representational saturation, and the effectiveness of multi-head decomposition -- find natural structural interpretations as consequences of regulated operator composition. Together, channel factorization and normalization emerge as organizing structural logic rather than isolated architectural choices. This perspective does not rely on post-hoc analogy, but follows a constructive path where each parallel arises from the preceding structural step. By recasting Transformer mechanics in operator language, the framework lowers the conceptual barrier between deep learning and many-body physics through shared mathematical structure, making tools and intuitions from each domain more readily legible to the other.

研究の動機と目的

Transformer の機構と多体物理のギャップを演算子論的フレームワークで埋める。
埋め込み、注意、正規化を構造化された基底変換と規制された演算子の積として解釈する。
実証的な Transformer の性質（安定性、飽和、マルチヘッドの有効性）を演算子ダイナミクスを通じて説明する。

提案手法

トークン埋め込みを連続表現空間への基底変換として定義する。
自己注意を Q, K, V の射影を用いた非エルミートン相互作用演算子としてモデル化する。
マルチヘッド注意を相互作用演算子のチャネル（サブ空間）因子分解として説明する。
残差とレイヤー正規化を、演算子の組み合わせの代数的展開と安定性調整子として描く。
深さをレイヤーごとの演算子の順序付き積として、Dyson 系に似た表現で示す。

Figure 1: Schematic representation of the Transformer architecture. (a) Transformer layers are depicted as discrete evolution steps along the vertical axis (red), representing ordered layer depth. (b) Detailed view of an individual $l$ -th layer: the Self-attention block introduces non-local, off-di

実験結果

リサーチクエスチョン

RQ1Transformer の機構を非エルミートン演算子論と多体物理の言語で置換できるか？
RQ2演算子フレームワーク内での埋め込み、自己注意、正規化、深さの構造的解釈は何か？
RQ3深さ、マルチヘッド注意、残差はどのように演算子の積と安定化メカニズムへ翻訳されるか？

主な発見

埋め込みは文脈的相互作用以前に連続潜在空間への基底変換として機能する。
自己注意は非エルミートン相互作用演算子に対応し、深さはこれらの相互作用の順序付き積を実装する。
残差接続は順序付き相互作用経路に対する正確な代数展開をもたらし、正規化は連続的更新の大きさを調整する。
マルチヘッド注意は相互作用演算子のチャネル因子分解を実装し、構造化されたマルチチャネル結合を可能にする。
レイヤー正規化や他の安定化子は非線形の演算子組み合わせを規制し、スケールでの安定性を維持する。
深さによる展開は時系列に沿った Dyson 系に類似し、層を跨ぐ高次の秩序付けられた摂動混合を示す。

Figure 2: Schematic view of multi-head attention as operator channel factorization. (a) A dense effective interaction $V_{\text{eff}}$ operating on the full representation space $d_{model}$ to map input states $x_{j}$ to updated states $x_{i}$ . (b) In multi-head attention, the interaction is block-

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。