QUICK REVIEW

[논문 리뷰] Using LLMs to Facilitate Formal Verification of RTL

Marcelo Orenes-Vera, Margaret Martonosi|arXiv (Cornell University)|2023. 09. 18.

Formal Methods in Verification인용 수 13

한 줄 요약

이 논문은 RTL로부터 사전에 정의된 명세 없이 Correct SystemVerilog Assertions(SVA)를 생성하기 위해 GPT-4를 사용하는 것을 조사하고, 이 흐름을 AutoSVA에 통합하여 FPV 커버리지를 향상시키고 RTL 생성을 돕는 데에도 적용한다.

ABSTRACT

Formal property verification (FPV) has existed for decades and has been shown to be effective at finding intricate RTL bugs. However, formal properties, such as those written as SystemVerilog Assertions (SVA), are time-consuming and error-prone to write, even for experienced users. Prior work has attempted to lighten this burden by raising the abstraction level so that SVA is generated from high-level specifications. However, this does not eliminate the manual effort of reasoning and writing about the detailed hardware behavior. Motivated by the increased need for FPV in the era of heterogeneous hardware and the advances in large language models (LLMs), we set out to explore whether LLMs can capture RTL behavior and generate correct SVA properties. First, we design an FPV-based evaluation framework that measures the correctness and completeness of SVA. Then, we evaluate GPT4 iteratively to craft the set of syntax and semantic rules needed to prompt it toward creating better SVA. We extend the open-source AutoSVA framework by integrating our improved GPT4-based flow to generate safety properties, in addition to facilitating their existing flow for liveness properties. Lastly, our use cases evaluate (1) the FPV coverage of GPT4-generated SVA on complex open-source RTL and (2) using generated SVA to prompt GPT4 to create RTL from scratch. Through these experiments, we find that GPT4 can generate correct SVA even for flawed RTL, without mirroring design errors. Particularly, it generated SVA that exposed a bug in the RISC-V CVA6 core that eluded the prior work's evaluation.

연구 동기 및 목표

형식적 속성 작성이 시간 소모적이고 오류가 발생하기 쉬운 문제를 다룬다.
LLMs가 RTL 동작을 포착하고 RTL만으로 올바른 SVA를 생성할 수 있는지 탐구한다.
GPT-4를 학습시켜 유효하고 완전한 SVA 속성을 생성하기 위한 반복 규칙 정제 워크플로를 개발한다.
AutoSVA를 GPT-4 기반 SVA 생성 흐름으로 확장한 AutoSVA2를 구현하고 복잡한 RTL 모듈에서 평가한다.

제안 방법

SVA의 정확성과 완전성을 판단하기 위한 FPV 기반 평가 프레임워크 설계
RTL로부터 구문적으로 및 의미적으로 올바른 SVA를 생성하도록 GPT-4 프롬프트 규칙을 반복적으로 정제한다.
향상된 GPT-4 흐름을 확장된 AutoSVA 프레임워크(AutoSVA2)에 통합하여 안전성과 발화 속성을 생성한다.
CVA6의 복잡한 RTL 모듈(PTW 및 TLB)에 대해 GPT-4가 생성한 SVA를 평가하고 RTL 커버리지를 비교한다.
FPV 피드백에 의해 안내된 반복적 RTL/SVA 루프를 사용하여 처음부터 GPT-4가 RTL을 생성한다.

Figure 1: FPV-based evaluation framework. The FPV tool returns whether the assertions generated by the LLM are correct or not—for a given RTL. Hinted by the errors or CEXs of the FPV report, the engineer manually writes or refines the rules that guide the LLM toward generating better SVA. The rule s

실험 결과

연구 질문

RQ1명시적 고수준 명세 없이 LLM이 올바른 SVA 속성을 RTL에서 생성할 수 있는가?
RQ2SVA의 의미 및 타이밍을 LLM에 가르치도록 프롬프트 규칙을 어떻게 설계할 수 있는가?
RQ3AutoSVA에 LLM 기반 SVA 흐름을 통합하면 RTL 속성 커버리지와 결함 탐지에 도움이 되는가?
RQ4SVA 프롬프트로 안내된 상태에서 GPT-4가 RTL을 처음부터 생성할 수 있는가, 그리고 FPV 피드백이 RTL 품질을 향상시키는가?

주요 결과

Iteration	Compile	Number of Properties (#Prop)	Number of Failing Properties (#Fail)	Main Issues
1	✗	4	-	IN: Undeclared var (no module prefix)
2	✗	6	-	SY: Wrong keyword for assertion
3	✗	8	-	SY: Using foreach as an assertion loop
4	✗	4	-	IN: Undeclared buffer_head_r
5	✗	9	-	SY: Error in include and assert naming
6	✗	6	-	SY: Duplicated assertion name
7	✓	6	4	WT: Wrong time semantics $\|->$
8	✗	6	3	IN: Undeclared var (no module prefix)
9	✗	5	-	IN: Module prefix; ignored prev. rules
10	✓	9	5	WT: Wrong time semantics $\|=>$
11	✓	7	1	WT: Missing $past in postcondition
12	✓	10	7	WT: Too much $past usage
13	✗	9	-	SY: Forgot foreach rule from T3
14	✓	12	4	WS: Incr. without wrap; wrong signal
15	✓	8	2	WT: Missing $past in postcondition
16	✓	9	4	WT/WS: Wrong bitwise manipulation
17	✓	8	3	WT: Wrong time semantics $\|=>$
18	✗	10	0	SY: Array-named assertions as_name[i]
19	✗	12	2	SY/WS: Empty precond.;wrong bitwise
20	✗	10	-	SY: Wrong width in constant usage
21	✓	7	1	WT: Missing $past for register
22	✓	9	1	WT: Incorrect $past in precondition
23	✓	8	0	Full Proof
24	✓	8	1	Assuming wrong behavior about out_rdy

GPT-4는 설계 오류를 반영하지 않고 버그 RTL로부터 올바른 SVA를 생성할 수 있다.
SVA의 질은 GPT-4를 이끄는 규칙 집합을 정제함에 따라 향상되며, T23의 FIFO 모듈에 대해 23번 반복 후 전체 구문 정확성을 달성한다.
AutoSVA2는 AutoSVA 단독보다 RTL 동작 커버리지를 크게 증가시키며, 특정 모듈에서 토글 커버리지가 최대 6배까지 향상된다.
GPT-4는 RISC-V CVA6 PTW의 버그를 드러내는 실패하는 어설션을 생성하여 알려진 RTL 버그와 대응하는 경우를 다수의 배치 이후 발견한다.
여러 배치의 GPT-4가 생성한 SVA를 사용하면 RTL 커버리지가 더 크게 증가한다(예: PTW: 여섯 배치에서 최대 약 1.25x 문장 커버리지; TLB: 최대 약 6x).
AutoSVA 어설션과 GPT-4 생성 어설션의 조합은 일부 모듈에서 보완적 커버리지를 제공한다.

Figure 2: Overview of AutoSVA2. Our additions to the original AutoSVA flow are shown with thick boxes and arrows; the original flow is shown with thin boxes and arrows. The green boxes indicate automatically generated artifacts. The green arrows indicate the SVA generation flow and the blue arrows t

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.