QUICK REVIEW

[論文レビュー] Verify Implementation Equivalence of Large Models

Qi Zhan, Xing Hu|arXiv (Cornell University)|Mar 23, 2026

Model-Driven Software Engineering Techniques被引用数 0

ひとこと要約

大規模モデルの異なるフレームワーク実装間の同値性を、e-graph を用いてオンデマンドで書き換え規則を合成・検証することで検証する Emerge は、手動規則なしに堅牢な同値性検証を可能にします。

ABSTRACT

Verifying whether two implementations of the same large model are equivalent across frameworks is difficult in practice. Even when they realize the same computation, their graphs may differ substantially in operator decomposition, tensor layout, and the use of fused or opaque kernels, making manual rewrite rules hard to build and maintain. We present Emerge, a framework for checking Implementation Equivalence over computation graphs of large-model implementations. Instead of writing rules manually, Emerge represents the two implementations in an e-graph, infers candidate relations from execution values, and synthesizes rewrite rules on demand when existing rules are insufficient. Each synthesized rule is validated using the strongest applicable method, including SMT- based checking for symbolically tractable cases and constraint-aware randomized testing for opaque kernels, and then propagated through e-graph rebuilding to establish larger equivalences. Our current implementation targets inference computation graphs captured from HuggingFace Transformers and vLLM. Our evaluation shows that Emerge establishes equivalence for correct implementation pairs at practical cost, while also providing useful by-products for debugging: it detects 10 of 13 known implementation bugs and uncovers 8 previously unknown implementation issues that were later confirmed by developers. In addition, Emerge synthesizes block-level rules that compare favorably with manually authored ones.

研究の動機と目的

異なるフレームワークからのモデル実装間の実装同値性の課題を動機づけ、 formalize する。
手動で書かれた rewrite 規則に頼らない、動的な規則合成ベースの検証フレームワークを提供する。
バグ検出、同値検証、および合成規則の品質の実用的有効性を示す。

提案手法

両方の実装を共同の e-graph に表現し、ノードレベルの同値性を順次確立する。
実行トレースから候補関係を推定し、必要に応じて補助的変換でグラフを拡張する。
整合しないが意味的に関連するサブグラフを結ぶよう、オンザフライで rewrite 規則を合成し、それを SMT 解法または制約認識型のランダム検査で検証する。
e-graph の再構築を通じて確立した同値性を伝播させ、計算グラフのより大きな部分をカバーする。
本番コードから計算グラフを抽出するために TorchDynamo 上で実装し、Transformers および vLLM で評価する。

Figure 1 . A part of GPT-2 Model used to illustrate equivalence verification between two implementations. Simplified and adjusted for clarity.

実験結果

リサーチクエスチョン

RQ1Emerge は異なるフレームワークの二つの実装が同じ機能を実現しているかを判定できるか？
RQ2 manual な規則が入手できない場合に、動的規則合成は同値性をどれだけ効果的に発見できるか？
RQ3 SMT ベースおよび制約認識型ランダム検査は、合成規則の検証においてどれだけ有効か？
RQ4実世界の大規模モデル実装に対して Emerge はどのような実用的なバグ検出機能を提供するか？

主な発見

Emerge は既知の実装バグのうち 13 件中 10 件を検出する。
Emerge は開発者によって後に確認された、これまで未知だった実装上の問題を 8 件発見する。
Emerge は正しい実装ペアに対して実用的なコストで同値性を確立する。
合成された高レベル rewrite 規則は、手動で作成された規則と比較して好ましい。
規則は故障局在化に有用で、モデル層全体へコストを分散させるよう伝搬する。

Figure 2 . Rule synthesis from execution traces. ① Initial relation ② Relation inferred from input values ③ Relation inferred from rule synthesis.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。