[论文解读] Rethinking Code Similarity for Automated Algorithm Design with LLMs
该论文引入 BehaveSim,一种基于问题解决轨迹和动态时间规整的算法行为相似性度量,用于在 LLM-AAD 设置中更好地评估算法相似性。
The rise of Large Language Model-based Automated Algorithm Design (LLM-AAD) has transformed algorithm development by autonomously generating code implementations of expert-level algorithms. Unlike traditional expert-driven algorithm development, in the LLM-AAD paradigm, the main design principle behind an algorithm is often implicitly embedded in the generated code. Therefore, assessing algorithmic similarity directly from code, distinguishing genuine algorithmic innovation from mere syntactic variation, becomes essential. While various code similarity metrics exist, they fail to capture algorithmic similarity, as they focus on surface-level syntax or output equivalence rather than the underlying algorithmic logic. We propose BehaveSim, a novel method to measure algorithmic similarity through the lens of problem-solving behavior as a sequence of intermediate solutions produced during execution, dubbed as problem-solving trajectories (PSTrajs). By quantifying the alignment between PSTrajs using dynamic time warping (DTW), BehaveSim distinguishes algorithms with divergent logic despite syntactic or output-level similarities. We demonstrate its utility in two key applications: (i) Enhancing LLM-AAD: Integrating BehaveSim into existing LLM-AAD frameworks (e.g., FunSearch, EoH) promotes behavioral diversity, significantly improving performance on three AAD tasks. (ii) Algorithm analysis: BehaveSim clusters generated algorithms by behavior, enabling systematic analysis of problem-solving strategies--a crucial tool for the growing ecosystem of AI-generated algorithms. Data and code of this work are open-sourced at https://github.com/RayZhhh/behavesim.
研究动机与目标
- 从 LLM-AAD 的问题解决行为角度出发,说明衡量算法相似性的必要性。
- 提出基于问题解决轨迹 (PSTrajs) 的行为相似性度量(BehaveSim)。
- 演示 PSTrajs 对齐的相似性如何在 LLM-AAD 框架中提升多样性与性能。
- 展示 BehaveSim 如何实现对 AI 生成算法的定量分析与聚类。
提出的方法
- 将 problem-solving trajectory (PSTraj) 定义为执行过程中产生的中间解序列。
- 通过对 PSTrajs 使用 Dynamic Time Warping (DTW) 计算成对距离来衡量行为相似性。
- 将 BehaveSim 与静态(令牌/结构/嵌入)和基于执行的相似性度量进行对比。
- 将 BehaveSim 集成到 LLM-AAD 框架中(如 FunSearch、EoH),以促进行为多样性。
- 在提供的 GitHub 仓库提供开源数据和代码。

实验结果
研究问题
- RQ1问题解决轨迹是否能捕捉到超越表层语法或输出的潜在算法逻辑?
- RQ2通过促进行为多样性,BehaveSim 是否能提升现有 LLM-AAD 方法的性能?
- RQ3BehaveSim 是否能将 AI 生成的算法按问题解决行为进行聚类,以实现定量分析?
主要发现
- BehaveSim 能区分在代码结构或输出相似的情况下具有不同问题解决行为的算法。
- 将 BehaveSim 集成到 FunSearch 与 EoH 中,提升了三个 AAD 任务的性能。
- BehaveSim 能按行为对生成的算法进行聚类,帮助分析问题解决策略。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。