QUICK REVIEW

[논문 리뷰] Rethinking Code Similarity for Automated Algorithm Design with LLMs

Rui Zhang, Zhichao Lu|arXiv (Cornell University)|2026. 03. 03.

Machine Learning and Data Classification인용 수 0

한 줄 요약

논문은 문제 해결 궤적과 동적 시간 왜곡(DTW)을 기반으로 한 알고리즘의 행동 유사성 지표인 BehaveSim을 제시하여 LLM-AAD 설정에서 알고리즘 간 유사성 평가를 향상시킵니다.

ABSTRACT

The rise of Large Language Model-based Automated Algorithm Design (LLM-AAD) has transformed algorithm development by autonomously generating code implementations of expert-level algorithms. Unlike traditional expert-driven algorithm development, in the LLM-AAD paradigm, the main design principle behind an algorithm is often implicitly embedded in the generated code. Therefore, assessing algorithmic similarity directly from code, distinguishing genuine algorithmic innovation from mere syntactic variation, becomes essential. While various code similarity metrics exist, they fail to capture algorithmic similarity, as they focus on surface-level syntax or output equivalence rather than the underlying algorithmic logic. We propose BehaveSim, a novel method to measure algorithmic similarity through the lens of problem-solving behavior as a sequence of intermediate solutions produced during execution, dubbed as problem-solving trajectories (PSTrajs). By quantifying the alignment between PSTrajs using dynamic time warping (DTW), BehaveSim distinguishes algorithms with divergent logic despite syntactic or output-level similarities. We demonstrate its utility in two key applications: (i) Enhancing LLM-AAD: Integrating BehaveSim into existing LLM-AAD frameworks (e.g., FunSearch, EoH) promotes behavioral diversity, significantly improving performance on three AAD tasks. (ii) Algorithm analysis: BehaveSim clusters generated algorithms by behavior, enabling systematic analysis of problem-solving strategies--a crucial tool for the growing ecosystem of AI-generated algorithms. Data and code of this work are open-sourced at https://github.com/RayZhhh/behavesim.

연구 동기 및 목표

LLM-AAD에서 문제 해결 행동의 관점에서 알고리즘 유사성을 측정할 필요성의 동기를 제시합니다.
문제 해결 궤적(PSTrajs)을 기반으로 한 행동 유사성 지표(BehaveSim)를 제안합니다.
PSTrajs-aligned 유사성이 LLM-AAD 프레임워크의 다양성과 성능을 어떻게 향상시키는지 보여줍니다.
BehaveSim이 AI 생성 알고리즘의 정량적 분석 및 클러스터링을 가능하게 하는 방법을 제시합니다.

제안 방법

문제 해결 궤적(PSTraj)을 실행 중에 생성된 중간 해의 시퀀스로 정의합니다.
DTW(Dynamic Time Warping)을 사용하여 PSTrajs 간의 페어와이즈 거리를 계산하여 행동 유사성을 측정합니다.
정적(토큰/구조/임베딩) 및 실행 기반 유사성 측정치와 BehaveSim를 대비합니다.
BehaveSim을 LLM-AAD 프레임워크(예: FunSearch, EoH)에 통합하여 행동 다양성을 촉진합니다.
제공된 GitHub 저장소에서 오픈 소스 데이터와 코드를 제공합니다.

Figure 1: Examples demonstrating existing code similarity metrics are insufficient for measuring algorithmic similarity. (a) Existing code similarity metrics, on the one hand, find the breadth-first search (BFS) and depth-first search (DFS) algorithms highly similar, despite the two algorithms being

실험 결과

연구 질문

RQ1문제 해결 궤적이 표면 구문이나 출력 너머의 기본 알고리즘 로직을 포착할 수 있을까요?
RQ2BehaveSim이 행동 다양성을 촉진함으로써 기존의 LLM-AAD 방법의 성능을 향상시키나요?
RQ3BehaveSim이 문제 해결 행동으로 AI 생성 알고리즘을 클러스터링하여 정량적 분석이 가능하게 하나요?

주요 결과

BehaveSim은 코드 구조나 출력이 비슷해도 서로 다른 문제 해결 행동을 보이는 알고리즘을 구분합니다.
FunSearch와 EoH에 BehaveSim를 통합하면 세 가지 AAD 작업에서 성능이 향상됩니다.
BehaveSim은 행동에 따라 생성된 알고리즘을 클러스터링할 수 있어 문제 해결 전략의 분석에 도움을 줍니다.

Figure 2: Problem-solving behaviors on the traveling salesman problem (TSP) for two algorithms with highly similar codes. The only distinction in their implementations lies in the use of argmin() and argmax() , which leads to profoundly different behaviors: Algorithm 1 chooses the nearest neighbor n

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.