QUICK REVIEW

[論文レビュー] SymTorch: A Framework for Symbolic Distillation of Deep Neural Networks

Elizabeth S. Z. Tan, Adil Soubki|arXiv (Cornell University)|Feb 24, 2026

Machine Learning in Materials Science被引用数 0

ひとこと要約

SymTorchは神経ネットワーク部品を記号的蒸留させ、解釈可能な代替表現を提供する。ケーススタディはGNN、PINN、LLM。

ABSTRACT

Symbolic distillation replaces neural networks, or components thereof, with interpretable, closed-form mathematical expressions. This approach has shown promise in discovering physical laws and mathematical relationships directly from trained deep learning models, yet adoption remains limited due to the engineering barrier of integrating symbolic regression into deep learning workflows. We introduce SymTorch, a library that automates this distillation by wrapping neural network components, collecting their input-output behavior, and approximating them with human-readable equations via PySR. SymTorch handles the engineering challenges that have hindered adoption: GPU-CPU data transfer, input-output caching, model serialization, and seamless switching between neural and symbolic forward passes. We demonstrate SymTorch across diverse architectures including GNNs, PINNs and transformer models. Finally, we present a proof-of-concept for accelerating LLM inference by replacing MLP layers with symbolic surrogates, achieving an 8.3\% throughput improvement with moderate performance degradation.

研究の動機と目的

neural network components の記号的蒸留を自動化するオープンソースフレームワークを提供する。
GPU-CPUデータ転送、キャッシュ、シリアライズ、フォワードパス切替を扱い、エンジニアリングの障壁を低減する。
アーキテクチャ（GNN、PINN、Transformers）全体での適用性を示し、LLM推論速度の潜在的高速化を提示する。
記号的代理が既知の物理法則を回復し、LLMの算術バイアスを明らかにする方法を示す。

提案手法

NN部品をSymbolicModelブロックとしてラップし、フォワード時に入力-outputデータを収集する。
収集したI/Oに対してPySRを用いた記号回帰を実行し、出力次元ごとに閉形式表現を得る。
選択したニューラルブロックをParetoフロントの記号方程式で置換し、ハイブリッドモデルを作成する。
フォワードパス中の活性化をキャッシュし、ニューラル計算と記号計算のシームレスな切替をサポートする。
ポイント・オブ・インタレストの近傍に対して記号的サロゲートを適合させ、SLIMEスタイルのローカル説明を提供する。

Figure 1 : A cartoon depicting how SymTorch is used to perform symbolic distillation on model components. For a trained PyTorch model, SymTorch wraps around any NN component in the model. The user passes in sample data and in the forward pass, the inputs and outputs (I/O) of the component are collec

実験結果

リサーチクエスチョン

RQ1多様なアーキテクチャにわたって、記号回帰はニューラネットワーク部品の入力-出力写像を忠実に近似できるのか。
RQ2Transformers/LLMsにおいて、神経ブロックを記号的代理に置換する際の実践的な利点とトレードオフ（精度対速度）はどうなるのか。
RQ3記号的蒸留は既知の物理法則をどの程度回復するか、またLLMにおける learned arithmetic biases をどの程度明らかにできるか。
RQ4ブラックボックスモデルにおける局所解釈性としてのSLIMEスタイル記号説明はどの程度有効か。

主な発見

ラベル	困惑度ベースライン	Δ 困惑度 PCA+MLP	Δ 困惑度 PCA+SymTorch	Δ 困惑度コントロール
ベースライン	10.62	+3.11	+3.14	+6.97

SymTorchはGNN、PINN、トランスフォーマーブロック全体で記号的蒸留を可能にする。
TransformersのMLP層3つを記号サロゲートに置換することでスループットが8.3%向上し、困難度の小さな困惑度の上昇を伴う（3.14 vs 10.62のベースライン）。
PCAで次元削減した部分空間の記号的サロゲートは、PCAのみの場合と同程度の困惑度低下でMLPの挙動を捉える。
PINNベースのPDE解は、記号的蒸留後に閉形式表現として回復可能である。
GNNベースの記号的蒸留は真の相互作用法則を回復し、以前の前提バイアスとの整合性を示す。

Figure 2 : Approximating local model behavior with SLIME. For a complex non-linear model, we choose the point of interest $\mathbf{x}^{*}$ . We sample points around this region and fit a symbolic model to these points.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。