QUICK REVIEW

[논문 리뷰] Verify Implementation Equivalence of Large Models

Qi Zhan, Xing Hu|arXiv (Cornell University)|2026. 03. 23.

Model-Driven Software Engineering Techniques인용 수 0

한 줄 요약

Emerge는 대형 모델 간 구현 등가를 확인하기 위해 필요에 따라 재작성 규칙을 합성·검증하는 e-그래프를 사용하여 수동 규칙 없이도 강건한 등가 검증을 가능하게 한다.

ABSTRACT

Verifying whether two implementations of the same large model are equivalent across frameworks is difficult in practice. Even when they realize the same computation, their graphs may differ substantially in operator decomposition, tensor layout, and the use of fused or opaque kernels, making manual rewrite rules hard to build and maintain. We present Emerge, a framework for checking Implementation Equivalence over computation graphs of large-model implementations. Instead of writing rules manually, Emerge represents the two implementations in an e-graph, infers candidate relations from execution values, and synthesizes rewrite rules on demand when existing rules are insufficient. Each synthesized rule is validated using the strongest applicable method, including SMT- based checking for symbolically tractable cases and constraint-aware randomized testing for opaque kernels, and then propagated through e-graph rebuilding to establish larger equivalences. Our current implementation targets inference computation graphs captured from HuggingFace Transformers and vLLM. Our evaluation shows that Emerge establishes equivalence for correct implementation pairs at practical cost, while also providing useful by-products for debugging: it detects 10 of 13 known implementation bugs and uncovers 8 previously unknown implementation issues that were later confirmed by developers. In addition, Emerge synthesizes block-level rules that compare favorably with manually authored ones.

연구 동기 및 목표

다양한 프레임워크의 모델 구현 간 구현 등가성 문제를 동기 부여하고 형식화한다.
수동으로 작성된 재작성 규칙에 의존하지 않는 동적 규칙 합성 기반 검증 프레임워크를 제공한다.
버그 탐지, 등가성 검증, 합성된 규칙의 품질에 대한 실용적 효과를 입증한다.

제안 방법

두 구현을 하나의 e-그래프에 표현하고 점진적으로 노드-수준의 등가성을 확립한다.
실행 트레이스로부터 후보 관계를 추론하고 필요 시 보조 변환으로 그래프를 보강한다.
의미적으로 관련되지만 일치하지 않는 부분 그래프를 연결하기 위해 즉석에서 재작성 규칙을 합성하고 SMT 해결 또는 제약 인식 무작위 테스트로 이를 검증한다.
구축된 등가성을 e-그래프 재구성을 통해 확산시켜 계산 그래프의 더 큰 부분을 커버한다.
TorchDynamo에 구현하여 프로덕션 코드로부터 계산 그래프를 추출하고 Transformers 및 vLLM에서 평가한다.

Figure 1 . A part of GPT-2 Model used to illustrate equivalence verification between two implementations. Simplified and adjusted for clarity.

실험 결과

연구 질문

RQ1Emerge가 서로 다른 프레임워크의 두 구현이 동일한 함수를 구현하는지 판단할 수 있는가?
RQ2수동 규칙이 없을 때 동적 규칙 합성이 등가를 발견하는 데 얼마나 효과적인가?
RQ3합성 규칙을 검증하는 데 SMT 기반 및 제약 인식 무작위 테스트의 효과는 어느 정도인가?
RQ4실세계 대형 모델 구현에 대해 Emerge가 제공하는 실용적 버그 탐지 기능은 무엇인가?

주요 결과

Emerge는 알려진 구현 버그 중 13개 중 10개를 탐지한다.
Emerge는 개발자에 의해 나중에 확인된 8개의 미리 알려지지 않은 구현 문제를 발견한다.
Emerge는 실제 비용으로 올바른 구현 쌍의 등가성을 확립한다.
합성된 고수준 재작성 규칙은 수동으로 작성된 규칙과 비교해 우수하다.
규칙은 결함 위치 추적에 유용하며 모델 계층 간 비용 분산에 도움이 된다.

Figure 2 . Rule synthesis from execution traces. ① Initial relation ② Relation inferred from input values ③ Relation inferred from rule synthesis.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.