QUICK REVIEW

[論文レビュー] Rethinking Code Similarity for Automated Algorithm Design with LLMs

Rui Zhang, Zhichao Lu|arXiv (Cornell University)|Mar 3, 2026

Machine Learning and Data Classification被引用数 0

ひとこと要約

The paper introduces BehaveSim, a behavioral similarity metric for algorithms based on problem-solving trajectories and dynamic time warping, to better assess algorithmic similarity in LLM-AAD settings.

ABSTRACT

The rise of Large Language Model-based Automated Algorithm Design (LLM-AAD) has transformed algorithm development by autonomously generating code implementations of expert-level algorithms. Unlike traditional expert-driven algorithm development, in the LLM-AAD paradigm, the main design principle behind an algorithm is often implicitly embedded in the generated code. Therefore, assessing algorithmic similarity directly from code, distinguishing genuine algorithmic innovation from mere syntactic variation, becomes essential. While various code similarity metrics exist, they fail to capture algorithmic similarity, as they focus on surface-level syntax or output equivalence rather than the underlying algorithmic logic. We propose BehaveSim, a novel method to measure algorithmic similarity through the lens of problem-solving behavior as a sequence of intermediate solutions produced during execution, dubbed as problem-solving trajectories (PSTrajs). By quantifying the alignment between PSTrajs using dynamic time warping (DTW), BehaveSim distinguishes algorithms with divergent logic despite syntactic or output-level similarities. We demonstrate its utility in two key applications: (i) Enhancing LLM-AAD: Integrating BehaveSim into existing LLM-AAD frameworks (e.g., FunSearch, EoH) promotes behavioral diversity, significantly improving performance on three AAD tasks. (ii) Algorithm analysis: BehaveSim clusters generated algorithms by behavior, enabling systematic analysis of problem-solving strategies--a crucial tool for the growing ecosystem of AI-generated algorithms. Data and code of this work are open-sourced at https://github.com/RayZhhh/behavesim.

研究の動機と目的

LLM-AAD における問題解決行動の観点からアルゴリズム類似度を測る必要性を動機づける。
問題解決軌跡（PSTrajs）に基づく行動的類似度指標（BehaveSim）を提案する。
PSTrajs aligned な類似性がLLM-AADフレームワークにおける多様性と性能を向上させる方法を示す。
BehaveSim がAI生成アルゴリズムの定量的分析とクラスタリングを可能にすることを示す。

提案手法

問題解決軌跡（PSTraj）を実行中に生成される中間解のシーケンスとして定義する。
PSTrajs 間の距離をDTWで計算して行動的類似度を測定する。
静的な（トークン/構造/埋め込み）および実行ベースの類似度指標と BehaveSim を対比する。
BehaveSim を FunSearch や EoH などのLLM-AADフレームワークに組み込み、行動的多様性を促進する。
提供されたGitHubリポジトリにオープンソースデータとコードを提供する。

Figure 1: Examples demonstrating existing code similarity metrics are insufficient for measuring algorithmic similarity. (a) Existing code similarity metrics, on the one hand, find the breadth-first search (BFS) and depth-first search (DFS) algorithms highly similar, despite the two algorithms being

実験結果

リサーチクエスチョン

RQ1問題解決軌跡は、表面的な構文や出力を超える基礎的なアルゴリズムロジックを捉えることができるか。
RQ2BehaveSim は行動的多様性を促進することで既存のLLM-AAD手法の性能を改善するか。
RQ3BehaveSim は問題解決行動によってAI生成アルゴリズムをクラスタリングし、定量的分析を可能にするか。

主な発見

BehaveSim は、コード構造や出力が類似していても、問題解決行動が異なるアルゴリズムを区別する。
FunSearch および EoH への BehaveSim の組み込みは、3つのAADタスクで性能を高める。
BehaveSim は生成アルゴリズムを行動ベースでクラスタリングでき、問題解決戦略の分析を支援する。

Figure 2: Problem-solving behaviors on the traveling salesman problem (TSP) for two algorithms with highly similar codes. The only distinction in their implementations lies in the use of argmin() and argmax() , which leads to profoundly different behaviors: Algorithm 1 chooses the nearest neighbor n

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。