QUICK REVIEW

[论文解读] CktEvo: Repository-Level RTL Code Benchmark for Design Evolution

Zhengyuan Shi, Jingxin Wang|arXiv (Cornell University)|Feb 10, 2026

Embedded Systems Design Techniques被引用 0

一句话总结

tldr: CktEvo 引入了一个仓库级别的 RTL 基准测试以及一个基于闭环的 LLM 框架，用于进化多文件 Verilog 仓库，在保持功能性的同时改善 PPA（面积、延迟、功耗代理）。该方法在无需人工输入的情况下，使用开源工具链实现可测量的 ADP 下降，并在不同设计上呈现不同的增益。

ABSTRACT

Register-Transfer Level (RTL) coding is an iterative, repository-scale process in which Power, Performance, and Area (PPA) emerge from interactions across many files and the downstream toolchain. While large language models (LLMs) have recently been applied to hardware design, most efforts focus on generation or debugging from natural-language prompts, where ambiguity and hallucinations necessitate expert review. A separate line of work begins from formal inputs, yet typically optimizes high-level synthesis or isolated modules and remains decoupled from cross-file dependencies. In this work, we present CktEvo, a benchmark and reference framework for repo-level RTL evolution. Unlike prior benchmarks consisting of isolated snippets, our benchmark targets complete IP cores where PPA emerges from cross-file dependencies. Our benchmark packages several high-quality Verilog repositories from real-world designs. We formalize the task as: given an initial repository, produce edits that preserve functional behavior while improving PPA. We also provide a closed-loop framework that couples LLM-proposed edits with toolchain feedback to enable cross-file modifications and iterative repair at repository scale. Our experiments demonstrate that the reference framework realizes PPA improvements without any human interactions. CktEvo establishes a rigorous and executable foundation for studying LLM-assisted RTL optimization that matters for engineering practice: repository-level, function-preserving, and PPA-driven.

研究动机与目标

Formally define repo-level RTL evolution as function-preserving, PPA-driven multi-file optimization across a design repository.
Provide a high-quality, multi-file Verilog benchmark (CktEvo) spanning diverse domains.
Propose a closed-loop framework that couples LLM-generated edits with toolchain feedback and formal verification.
Demonstrate that LLM-based evolution can reduce area and delay without human intervention.

提出的方法

Formal task formulation with a verification function and an evaluation function that measure equivalence and PPA via LS and STA tools.
Construction of a repo-level RTL benchmark comprising 11 multi-file Verilog designs with diverse application domains.
Graph-based Code Analyzer converts RTL into a structured Control Data Flow Graph (CDFG) and annotates bottlenecks from tool reports.
Prompt Generator creates context-rich prompts from annotated data and code snippets for LLMs to propose targeted edits.
LLM-driven mutation with a prompt-driven offsprings generation, followed by rapid rectification and formal verification to ensure equivalence.
Evolution algorithm (Dual-Cycle LLM-Guided RTL Evolution) using island model MAP-Elites style archive to balance quality and diversity.

Figure 1 . Overview of close-loop evolution framework.

实验结果

研究问题

RQ1Can repo-level RTL evolution improve PPA while preserving functional behavior without human intervention?
RQ2Does a closed-loop LLM-guided framework leveraging toolchain feedback enable effective cross-file, repository-scale RTL optimizations?
RQ3What are the practical gains and design characteristics that influence PPA improvements across diverse RTL repositories?
RQ4How does the proposed benchmark and framework perform with open-source versus commercial toolchains?

主要发现

The framework reduces PPA metrics across all benchmark designs with open-source tools, achieving an average ADP reduction of 10.50% using DeepSeek-v3.
Delay reductions outperform area reductions, with a geometric mean delay drop of 7.92% and area drop of 2.80% in open-source tool experiments.
Significant timing improvements occur in control-intensive designs (e.g., hsm, mem_ctrl, sdc_ctrl, simple_cpu, risc) aided by the graph-based code analyzer.
With commercial toolchains, the overall improvements are smaller (geometric mean ADP reduction 1.77%) but still demonstrate nontrivial RTL evolution gains, e.g., 10.61% delay reduction on audio and 536.77 μm^2 area reduction on hsm.
LLMs tend to apply local, synthesis-friendly optimizations (coding style, logic flattening, state-machine encoding) rather than full architectural redesigns, highlighting current limits for large, global refactors.
The framework operates in a closed-loop, cross-file setting and demonstrates that repository-level, function-preserving RTL optimization is feasible without human interaction.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。