[论文解读] VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool
VerilogCoder 使用带有 TCRG 任务规划器和基于 AST 的波形跟踪的多智能体框架,能够自主生成 Verilog 代码并修复语法与功能错误,在 VerilogEval-Human v2 上实现高通过率。
Due to the growing complexity of modern Integrated Circuits (ICs), automating hardware design can prevent a significant amount of human error from the engineering process and result in less errors. Verilog is a popular hardware description language for designing and modeling digital systems; thus, Verilog generation is one of the emerging areas of research to facilitate the design process. In this work, we propose VerilogCoder, a system of multiple Artificial Intelligence (AI) agents for Verilog code generation, to autonomously write Verilog code and fix syntax and functional errors using collaborative Verilog tools (i.e., syntax checker, simulator, and waveform tracer). Firstly, we propose a task planner that utilizes a novel Task and Circuit Relation Graph retrieval method to construct a holistic plan based on module descriptions. To debug and fix functional errors, we develop a novel and efficient abstract syntax tree (AST)-based waveform tracing tool, which is integrated within the autonomous Verilog completion flow. The proposed methodology successfully generates 94.2% syntactically and functionally correct Verilog code, surpassing the state-of-the-art methods by 33.9% on the VerilogEval-Human v2 benchmark.
研究动机与目标
- Motivate automatic Verilog code generation to reduce design time and human error in complex ICs.
- Develop a Task and Circuit Relation Graph (TCRG) based planner to produce sub-tasks with signal and transition details.
- Integrate an AST-based waveform tracing tool to diagnose and fix functional errors during Verilog generation.
- Enable collaboration among multiple AI agents (planning, coding, debugging) using ReAct prompting and Verilog tooling.
- Evaluate through ablation studies on VerilogEval-Human v2 to quantify improvements from planning and AST-WT components.
提出的方法
- Introduce a multi-AI agent framework for Verilog coding and debugging.
- Develop a Task and Circuit Relation Graph (TCRG) based task planner to generate sub-tasks with signal and transition information.
- Create an AST-based waveform tracing tool (AST-WT) to backtrace mismatched waveforms via Verilog ASTs (Pyverilog).
- Use a two-LLM code path: Verilog Engineer and Verilog Verification Assistant for syntax-correct code generation.
- Use a Debug Agent aided by a Verilog simulator and AST-WT to validate functionality against a testbench.
- Leverage ReAct prompting for Thought-Action-Observation reasoning to orchestrate tools (iverilog, waveform tracing, simulator).

实验结果
研究问题
- RQ1How can a graph-based task planner (TCRG) improve sub-task quality for Verilog code generation compared to traditional planning?
- RQ2Does integrating an AST-based waveform tracing tool improve functional correctness in autonomous Verilog coding?
- RQ3What is the impact of a multi-AI-agent architecture on syntax and functional correctness in Verilog generation?
- RQ4How does the proposed VerilogCoder perform on the VerilogEval-Human v2 benchmark relative to existing LLM approaches?
主要发现
| 方法 | 模型规模 | 模型类型 | 通过率 (%) |
|---|---|---|---|
| RTL-Coder | 6.7B | Open | 36.5 |
| DeepSeek Coder | 6.7B | Open | 28.2 |
| CodeGemma | 7B | Open | 23.1 |
| DeepSeek Coder | 33B | Open | 37.2 |
| CodeLlama | 70B | Open | 41.0 |
| Llama3 | 70B | Open | 41.7 |
| Mistral Large | Undisclosed | Closed | 48.7 |
| GPT-4 | Undisclosed | Closed | 50.6 |
| GPT-4 Turbo | Undisclosed | Closed | 60.3 |
| VerilogCoder (Llama3) | 70B | Open | 67.3 |
| VerilogCoder (GPT-4 Turbo) | Undisclosed | Closed | 94.2 |
- VerilogCoder achieves 94.2% pass rate for syntax and functional correctness on VerilogEval-Human v2, surpassing state-of-the-art by 33.9%.
- Ablation shows the TCRG-based planner plus AST-WT yields the largest gains, with AST-WT contributing to a 11.5% improvement in certain task categories.
- Compared to non-agent baselines, VerilogCoder demonstrates substantial improvements in combinational and FSM-related tasks (e.g., K-map, transition tables).
- GPT-4 Turbo-based VerilogCoder achieves 94.2% pass rate, while Llama3-based VerilogCoder reaches 67.3% on the same benchmark.
- Average interaction metrics indicate modest but meaningful messaging (1.58 high-level planner rounds, 1.09 TCRG retrieval rounds; code agent ~2.37 simulator calls and 1.37 AST-WT calls).

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。