QUICK REVIEW

[論文レビュー] ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution

Liu Yang, Zeyu Nie|arXiv (Cornell University)|Mar 3, 2026

Parallel Computing and Optimization Techniques被引用数 0

ひとこと要約

ParEVO は特化型 LLM を訓練し、Evolutionary Coding Agent を用いて不規則データ向けの高性能並列コードを合成・最適化。ParEvalで大規模なスピードアップを達成し、人間のベースラインに匹敵。

ABSTRACT

The transition from sequential to parallel computing is essential for modern high-performance applications but is hindered by the steep learning curve of concurrent programming. This challenge is magnified for irregular data structures (such as sparse graphs, unbalanced trees, and non-uniform meshes) where static scheduling fails and data dependencies are unpredictable. Current Large Language Models (LLMs) often fail catastrophically on these tasks, generating code plagued by subtle race conditions, deadlocks, and sub-optimal scaling. We bridge this gap with ParEVO, a framework designed to synthesize high-performance parallel algorithms for irregular data. Our contributions include: (1) The Parlay-Instruct Corpus, a curated dataset of 13,820 tasks synthesized via a "Critic-Refine" pipeline that explicitly filters for empirically performant algorithms that effectively utilize Work-Span parallel primitives; (2) specialized DeepSeek, Qwen, and Gemini models fine-tuned to align probabilistic generation with the rigorous semantics of the ParlayLib library; and (3) an Evolutionary Coding Agent (ECA) that improves the "last mile" of correctness by iteratively repairing code using feedback from compilers, dynamic race detectors, and performance profilers. On the ParEval benchmark, ParEVO achieves an average 106x speedup (with a maximum of 1103x) across the suite, and a robust 13.6x speedup specifically on complex irregular graph problems, outperforming state-of-the-art commercial models. Furthermore, our evolutionary approach matches state-of-the-art expert human baselines, achieving up to a 4.1x speedup on specific highly-irregular kernels. Source code and datasets are available at https://github.com/WildAlg/ParEVO.

研究の動機と目的

静的スケジューリングが機能せず、LLM にシーケンシャルな偏りが生まれる不規則データワークロード（グラフ、スパース構造）を並列化する課題を動機づけ、対処する。
ParlayLib のプリミティブを用いた正しく構築される並列コードを生成するデータ指向の合成パイプラインを提案する。
コンパイラ、レース検出、プロファイラをフィードバックとして利用する Evolutionary Coding Agent（ECA）を導入し、コードを反復的に改善する。
ParlayLib/Rust Primitive の基礎付けで複数の LLM を微調整し、並列プリミティブの正しい意味論と整合させる。
商用モデルおよび専門家人間ベースラインと比較して、C++および Rust のベンチマークで強い経験的利得を示す。

提案手法

厳密な compile-and-test 検証を用いた Teacher-Student-Critic ループで Parlay-Instruct コーパスを作成。
LoRA と多段階アライメント・パイプラインを用いて ParlayLib および Rust プリミティブで DeepSeek、Qwen、Gemini モデルを微調整。
コード生成を AST 上の進化探索として定式化し、候補集の集団と適応度をコンパイル、テスト、レース検出、性能で評価。
MAP-Elites 選択を用いてコード長、循環的複雑さ、同期頻度の多様性を維持。
正確性評価として LLM に依存せず、外部の決定論的ツール（コンパイラ、動的レース検出器）をグラウンド-truth 判定として使用。
ParEval、PBBS/RPB ベースライン、DMOJ の保持-out 問題を用いて Build@1、Pass@1、Speedup@1 を測定。

実験結果

リサーチクエスチョン

RQ1ParlayLib プリミティブで微調整した LLM は、不規則データ向けの並列コードをコンパイル可能で意味論的に正しく生み出せるか。
RQ2実行時フィードバックを活用する Evolutionary Coding Agent は単発生成を超え正確性と性能を向上させるか。
RQ3意味論的基盤の強化とピーク実行速度向上との間にどのような性能トレードオフ（整合性税）があるか。
RQ4ParEVO は訓練データ外の Rust および C++ 不規則並列ワークロードへどれだけ一般化できるか。
RQ5ParEVO が専門家の人間著作ベースラインと比べて複雑な不規則カーネルでどの程度優れているか。

主な発見

Model/Method	Language	Runtime (s)	Speedup (1T)	Speedup (Base)
Baseline code (PBBS)	C++	1.24	–	–
PAREVO (GEMINI)	Rust	0.07728	0.938x	4.125x
PAREVO (GEMINI)	Rust	0.1928	21.43286835x	1.0708x
PAREVO (GEMINI)	C++	1.169	23.689x	1.061x
PAREVO (GEMINI)	C++	0.019	>13.94x	2.68421x
PAREVO (GEMINI)	Rust	0.08865	15.482x	1.305x
PAREVO (GEMINI)	C++	6.627	>1.814x	2.68421x
DeepSeek-Parlay	Parlay	0.33	–	–
Gemini-2.5-Parlay	Parlay	0.33	106.87x	0.84x
Qwen3-Parlay	Parlay	0.33	8.63x	0.50x
DeepSeek-6.7B-Base	Parlay	0.11	3.65x	0.89x

Gemini-2.5-Parlay は ParEval タスクでベースラインを平均106x速く、最大で1103xの速度up。
微調整済みモデルは Build@1 でほぼ満点に近く、Pass@1 および Speedup@1 がベースモデルより大幅に向上。
ParEVO の意味論的整合性により ParlayLib プリミティブの正しい使用（例: 複雑な型に対するジェネリック自動パラメータを持つ sort_inplace など）を可能にし、実行速度の改善を促進（複雑なソート Task での 17.5x など）。
Evolutionary Coding Agent は 30 回の反復によるアブレーション研究で単発生成に対し 2.2x のパフォーマンス改善を示す。
ParEVO は複数の問題（例: 最大独立集合、最大マッチング）で expert PBBS/RPB ベースラインと同等以上を達成し、Rust カーネルで最大約 4.1x の顕著な速度up を示す。
強いスケーリング結果は、いくつかの不規則カーネルで 64 コアまでほぼ線形の速度up を示し、FFT/DFT スケーリングは約40xに達する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。