[论文解读] Fuzz4All: Universal Fuzzing with Large Language Models
Fuzz4All 将大型语言模型用作通用输入生成器和变异器,以对多种语言和 SUT 进行模糊测试,从而实现更高的覆盖率并在真实世界系统中发现大量错误。
Fuzzing has achieved tremendous success in discovering bugs and vulnerabilities in various software systems. Systems under test (SUTs) that take in programming or formal language as inputs, e.g., compilers, runtime engines, constraint solvers, and software libraries with accessible APIs, are especially important as they are fundamental building blocks of software development. However, existing fuzzers for such systems often target a specific language, and thus cannot be easily applied to other languages or even other versions of the same language. Moreover, the inputs generated by existing fuzzers are often limited to specific features of the input language, and thus can hardly reveal bugs related to other or new features. This paper presents Fuzz4All, the first fuzzer that is universal in the sense that it can target many different input languages and many different features of these languages. The key idea behind Fuzz4All is to leverage large language models (LLMs) as an input generation and mutation engine, which enables the approach to produce diverse and realistic inputs for any practically relevant language. To realize this potential, we present a novel autoprompting technique, which creates LLM prompts that are wellsuited for fuzzing, and a novel LLM-powered fuzzing loop, which iteratively updates the prompt to create new fuzzing inputs. We evaluate Fuzz4All on nine systems under test that take in six different languages (C, C++, Go, SMT2, Java and Python) as inputs. The evaluation shows, across all six languages, that universal fuzzing achieves higher coverage than existing, language-specific fuzzers. Furthermore, Fuzz4All has identified 98 bugs in widely used systems, such as GCC, Clang, Z3, CVC5, OpenJDK, and the Qiskit quantum computing platform, with 64 bugs already confirmed by developers as previously unknown.
研究动机与目标
- 提出一种跨多种输入语言和功能可用的通用模糊测试方法。
- 通过利用 LLM 进行输入生成和变异,消除对语言特定模糊测试器的需求。
- 通过 autoprompting 自动生成提示,将任意用户输入提炼为有效的模糊测试提示。
- 开发一个基于 LLM 的迭代模糊测试循环,以产生多样化、特性丰富的输入。
- 在多种语言和真实世界系统上评估该方法,以评估覆盖率和错误发现情况。
提出的方法
- 引入 autoprompting,将用户提供的文档提炼为简洁的模糊测试提示。
- 使用蒸馏型 LLM 生成候选提示,使用生成型 LLM 产生模糊测试输入。
- 通过生成输入的有效性/覆盖潜力对提示进行评分,并选择最佳提示。
- 实现一个基于 LLM 的模糊测试循环,附加示例和生成策略以创建新输入。
- 结合 generate-new、mutate-existing 和 semantic-equiv 策略,迭代地实现模糊测试输入的多样化。
- 在不对 SUT 进行插桩的情况下,将模糊测试输出与用户定义的 oracle(如崩溃)进行验证。

实验结果
研究问题
- RQ1与现有语言特定模糊测试器相比,Fuzz4All 在代码覆盖率方面如何?
- RQ2Fuzz4All 是否能够有效执行针对特定特征的定向模糊测试?
- RQ3autoprompting 和模糊测试循环组件对整体有效性的贡献是什么?
- RQ4Fuzz4All 在跨越多种语言和 SUT 的现实世界系统中发现了哪些漏洞?
主要发现
- Fuzz4All 在六种语言和九个 SUT 上的代码覆盖率高于最先进的模糊测试器,平均提升为 36.8%。
- 该方法在广泛使用的系统中发现了 98 个漏洞,其中 64 个漏洞已被开发者确认为此前未知。
- Fuzz4All 支持通用模糊测试和由用户提供的文档或规范驱动的特性定向模糊测试。
- 一种 autoprompting 策略将任意用户输入提炼为有效的模糊测试提示,从而实现跨语言适用性。
- 一个基于 LLM 的模糊测试循环持续使用示例和生成策略来更新提示,以产生多样化的输入。
- 该方法不需要对 SUT 进行插桩,简化了实际应用。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。