QUICK REVIEW

[论文解读] Verify Implementation Equivalence of Large Models

Qi Zhan, Xing Hu|arXiv (Cornell University)|Mar 23, 2026

Model-Driven Software Engineering Techniques被引用 0

一句话总结

Emerge 通过使用等价图（e-graph）按需综合并验证重写规则来检查跨框架实现的大模型的一致性，从而在无需人工规则的情况下实现鲁棒的一致性验证。

ABSTRACT

Verifying whether two implementations of the same large model are equivalent across frameworks is difficult in practice. Even when they realize the same computation, their graphs may differ substantially in operator decomposition, tensor layout, and the use of fused or opaque kernels, making manual rewrite rules hard to build and maintain. We present Emerge, a framework for checking Implementation Equivalence over computation graphs of large-model implementations. Instead of writing rules manually, Emerge represents the two implementations in an e-graph, infers candidate relations from execution values, and synthesizes rewrite rules on demand when existing rules are insufficient. Each synthesized rule is validated using the strongest applicable method, including SMT- based checking for symbolically tractable cases and constraint-aware randomized testing for opaque kernels, and then propagated through e-graph rebuilding to establish larger equivalences. Our current implementation targets inference computation graphs captured from HuggingFace Transformers and vLLM. Our evaluation shows that Emerge establishes equivalence for correct implementation pairs at practical cost, while also providing useful by-products for debugging: it detects 10 of 13 known implementation bugs and uncovers 8 previously unknown implementation issues that were later confirmed by developers. In addition, Emerge synthesizes block-level rules that compare favorably with manually authored ones.

研究动机与目标

Motivate and formalize the problem of Implementation Equivalence across model implementations from different frameworks.
Provide a dynamic, rule-synthesis based verification framework that does not rely on manually authored rewrite rules.
Demonstrate practical effectiveness in bug detection, equivalence verification, and quality of synthesized rules.

提出的方法

Represent both implementations in a joint e-graph and incrementally establish node-level equivalences.
Infer candidate relations from execution traces and augment the graph with auxiliary transformations when necessary.
Synthesize rewrite rules on the fly to connect mismatched but semantically related subgraphs, and validate them with SMT solving or constraint-aware randomized testing.
Propagate established equivalences through e-graph rebuilding to cover larger portions of the computation graphs.
Implement on TorchDynamo to extract computation graphs from production code and evaluate on Transformers and vLLM.

Figure 1 . A part of GPT-2 Model used to illustrate equivalence verification between two implementations. Simplified and adjusted for clarity.

实验结果

研究问题

RQ1Can Emerge determine whether two implementations from different frameworks realize the same function?
RQ2How effective is dynamic rule synthesis at discovering equivalences where manual rules are unavailable?
RQ3What is the effectiveness of SMT-based and constraint-aware randomized testing in validating synthesized rules?
RQ4What practical bug-detection capabilities does Emerge provide for real-world large-model implementations?

主要发现

Emerge detects 10 of 13 known implementation bugs.
Emerge uncovers 8 previously unknown implementation issues later confirmed by developers.
Emerge establishes equivalence for correct implementation pairs at practical cost.
Synthesized high-level rewrite rules compare favorably with manually authored ones.
Rules are useful for fault localization and propagate to amortize cost across model layers.

Figure 2 . Rule synthesis from execution traces. ① Initial relation ② Relation inferred from input values ③ Relation inferred from rule synthesis.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。