QUICK REVIEW

[论文解读] Using LLMs to Facilitate Formal Verification of RTL

Marcelo Orenes-Vera, Margaret Martonosi|arXiv (Cornell University)|Sep 18, 2023

Formal Methods in Verification被引用 13

一句话总结

该论文研究使用 GPT-4 从 RTL 在没有预定义规格的情况下生成正确的 SystemVerilog Assertions (SVA)，并将此流程集成到 AutoSVA 以提升 FPV 覆盖率，甚至帮助 RTL 生成。

ABSTRACT

Formal property verification (FPV) has existed for decades and has been shown to be effective at finding intricate RTL bugs. However, formal properties, such as those written as SystemVerilog Assertions (SVA), are time-consuming and error-prone to write, even for experienced users. Prior work has attempted to lighten this burden by raising the abstraction level so that SVA is generated from high-level specifications. However, this does not eliminate the manual effort of reasoning and writing about the detailed hardware behavior. Motivated by the increased need for FPV in the era of heterogeneous hardware and the advances in large language models (LLMs), we set out to explore whether LLMs can capture RTL behavior and generate correct SVA properties. First, we design an FPV-based evaluation framework that measures the correctness and completeness of SVA. Then, we evaluate GPT4 iteratively to craft the set of syntax and semantic rules needed to prompt it toward creating better SVA. We extend the open-source AutoSVA framework by integrating our improved GPT4-based flow to generate safety properties, in addition to facilitating their existing flow for liveness properties. Lastly, our use cases evaluate (1) the FPV coverage of GPT4-generated SVA on complex open-source RTL and (2) using generated SVA to prompt GPT4 to create RTL from scratch. Through these experiments, we find that GPT4 can generate correct SVA even for flawed RTL, without mirroring design errors. Particularly, it generated SVA that exposed a bug in the RISC-V CVA6 core that eluded the prior work's evaluation.

研究动机与目标

解决编写形式属性耗时且易出错的挑战。
探索大语言模型是否能够捕捉 RTL 行为并仅从 RTL 生成正确的 SVA。
开发一个迭代的规则精炼流程，以训练 GPT-4 生成有效且完整的 SVA 属性。
将基于 GPT-4 的 SVA 生成流程扩展到 AutoSVA，并在复杂的 RTL 模块上进行评估。

提出的方法

设计基于 FPV 的评估框架以判断 SVA 的正确性和完整性。
迭代地改进 GPT-4 提示规则，使其从 RTL 生成语法正确且语义正确的 SVA。
将改进后的 GPT-4 流整合到扩展的 AutoSVA 框架（AutoSVA2）以生成安全性和活跃性属性。
在复杂 RTL 模块（CVA6 的 PTW 和 TLB）上评估 GPT-4 生成的 SVA，并比较 RTL 覆盖率。
从头使用 GPT-4 生成的 RTL，并通过 FPV 反馈引导的迭代 RTL/SVA 循环。

Figure 1: FPV-based evaluation framework. The FPV tool returns whether the assertions generated by the LLM are correct or not—for a given RTL. Hinted by the errors or CEXs of the FPV report, the engineer manually writes or refines the rules that guide the LLM toward generating better SVA. The rule s

实验结果

研究问题

RQ1在没有明确高层规范的情况下，LLM 是否能从 RTL 生成正确的 SVA 属性？
RQ2如何设计提示规则以教会 LLM SVA 的语义和时序？
RQ3将基于 LLM 的 SVA 流集成到 AutoSVA 中是否能提升 RTL 属性覆盖率和故障检测？
RQ4在以 SVA 提示为引导时，GPT-4 是否能从头生成 RTL，并且 FPV 反馈是否能提升 RTL 的质量？

主要发现

GPT-4 可以从有缺陷的 RTL 生成正确的 SVA，而不需要模仿设计错误。
随着引导 GPT-4 的规则集得到改进，SVA 的质量提高，在 T23 的 FIFO 模块中经过 23 次迭代实现了完整的语法正确性。
AutoSVA2 相较于单独的 AutoSVA 在 RTL 行为覆盖上有显著提升，对某些模块的开关覆盖率最高提升至 6 倍。
GPT-4 通过在多批次后产生对应已知 RTL 缺陷的失败断言，揭示了 RISC-V CVA6 PTW 的一个错误。
使用多批次的 GPT-4 生成的 SVA 能带来更高的 RTL 覆盖率（例如 PTW：六批次可达约 1.25 倍的语句覆盖；TLB：约 6 倍）。
AutoSVA 断言与 GPT-4 生成的断言的结合为某些模块提供互补覆盖。

Figure 2: Overview of AutoSVA2. Our additions to the original AutoSVA flow are shown with thick boxes and arrows; the original flow is shown with thin boxes and arrows. The green boxes indicate automatically generated artifacts. The green arrows indicate the SVA generation flow and the blue arrows t

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。