QUICK REVIEW

[论文解读] Unlocking Hardware Security Assurance: The Potential of LLMs

Xingyu Meng, Amisha Srivastava|arXiv (Cornell University)|Aug 21, 2023

Physical Unclonable Functions (PUFs) and Hardware Security被引用 16

一句话总结

本文提出 NSPG，一种基于 NLP 的框架，使用 HS-BERT 自动从 SoC 文档中提取硬件安全属性，实现漏洞检测与错误发现；它在 1,723 个句子中识别了 326 条属性，在 OpenTitan 中发现了八个漏洞，性能约比 ChatGPT 高出约 15%。

ABSTRACT

System-on-Chips (SoCs) form the crux of modern computing systems. SoCs enable high-level integration through the utilization of multiple Intellectual Property (IP) cores. However, the integration of multiple IP cores also presents unique challenges owing to their inherent vulnerabilities, thereby compromising the security of the entire system. Hence, it is imperative to perform hardware security validation to address these concerns. The efficiency of this validation procedure is contingent on the quality of the SoC security properties provided. However, generating security properties with traditional approaches often requires expert intervention and is limited to a few IPs, thereby resulting in a time-consuming and non-robust process. To address this issue, we, for the first time, propose a novel and automated Natural Language Processing (NLP)-based Security Property Generator (NSPG). Specifically, our approach utilizes hardware documentation in order to propose the first hardware security-specific language model, HS-BERT, for extracting security properties dedicated to hardware design. To evaluate our proposed technique, we trained the HS-BERT model using sentences from RISC-V, OpenRISC, MIPS, OpenSPARC, and OpenTitan SoC documentation. When assessedb on five untrained OpenTitan hardware IP documents, NSPG was able to extract 326 security properties from 1723 sentences. This, in turn, aided in identifying eight security bugs in the OpenTitan SoC design presented in the hardware hacking competition, Hack@DAC 2022.

研究动机与目标

应对具有多个 IP 核的 SoC 生成鲁棒硬件安全属性的挑战。
开发一个基于 NLP 的自动化属性生成器（NSPG），利用硬件领域的 BERT（HS-BERT）从设计文档中提取安全属性。
创建面向领域的数据增强和修改技术，用于训练 HS-BERT 和用于属性识别的序列分类器。
在未见的 OpenTitan 文档上验证 NSPG，并展示 Hack@DAC 2022 中的实际漏洞发现，将结果与 ChatGPT 进行基准比较。

提出的方法

构建包含数据增强、基于硬件文档的掩码语言模型预训练（HS-BERT）的领域适应，以及序列分类模型（SCM）的 NSPG。
组装硬件文档数据集：D_pre（在 15,583 句子上进行 MLM 的预训练数据）、D_cls（4,427 条用于属性/非属性的标注句子）、D_val（708 条用于未见验证的标注句子）。
应用数据增强（随机换位、随机删除、同义词替换、随机插入）和领域特定的片段插入来丰富训练数据。
在标注数据上使用 HS-BERT 微调 SCM，将句子分类为与安全属性相关或不相关，选择 MOT 基于数据修改以获得最佳结果。
在配置（基线、MT、MOT、MTT、MOTMT）下比较 HS-BERT、General BERT 和 SciBERT，以选择表现最好的模型。
在 OpenTitan/派生自 OpenTitan 的文档上进行评估，展示属性提取，然后将这些属性应用于检测 OpenTitan 设计中的漏洞。

实验结果

研究问题

RQ1NSPG 是否能自动从 SoC 文档中挖掘并生成硬件安全属性？
RQ2带有领域数据增强的硬件领域调整 BERT（HS-BERT）与通用 BERT 模型在此任务上的表现如何？
RQ3数据修改技术对安全属性分类性能有何影响？
RQ4生成的属性是否能揭示 OpenTitan 设计中的真实漏洞并且超越基线或非领域模型？

主要发现

NSPG 从未见的 OpenTitan 文档中的 1,723 句子中提取了 326 条安全属性。
Eight security bugs in the Hack@DAC 2022 OpenTitan design were identified using NSPG-generated properties.
NSPG outperformed ChatGPT by about 15% in identifying security properties in OpenTitan documentation.
Among HS-BERT variants, MOT modification yielded the best validation performance with OpenTitan data (82% accuracy, 90% recall on average).
HS-BERT outperformed general BERT and SciBERT in accuracy and recall for the property extraction task on the validation set.
OpenTitan, RISCV, and OpenRISC validation accuracies reached 81.5%, 79.1%, and 88.3% respectively under MOT-HS-BERT, with recalls of 93%, 90.1%, and 87% respectively.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。