[論文レビュー] Fuzz4All: Universal Fuzzing with Large Language Models
Fuzz4All は large language models を universal input generator and mutator として用い、複数の languages と SUTs をファズテストし、より高いカバレッジを達成し、現実世界のシステム全体で多数のバグを発見している。
Fuzzing has achieved tremendous success in discovering bugs and vulnerabilities in various software systems. Systems under test (SUTs) that take in programming or formal language as inputs, e.g., compilers, runtime engines, constraint solvers, and software libraries with accessible APIs, are especially important as they are fundamental building blocks of software development. However, existing fuzzers for such systems often target a specific language, and thus cannot be easily applied to other languages or even other versions of the same language. Moreover, the inputs generated by existing fuzzers are often limited to specific features of the input language, and thus can hardly reveal bugs related to other or new features. This paper presents Fuzz4All, the first fuzzer that is universal in the sense that it can target many different input languages and many different features of these languages. The key idea behind Fuzz4All is to leverage large language models (LLMs) as an input generation and mutation engine, which enables the approach to produce diverse and realistic inputs for any practically relevant language. To realize this potential, we present a novel autoprompting technique, which creates LLM prompts that are wellsuited for fuzzing, and a novel LLM-powered fuzzing loop, which iteratively updates the prompt to create new fuzzing inputs. We evaluate Fuzz4All on nine systems under test that take in six different languages (C, C++, Go, SMT2, Java and Python) as inputs. The evaluation shows, across all six languages, that universal fuzzing achieves higher coverage than existing, language-specific fuzzers. Furthermore, Fuzz4All has identified 98 bugs in widely used systems, such as GCC, Clang, Z3, CVC5, OpenJDK, and the Qiskit quantum computing platform, with 64 bugs already confirmed by developers as previously unknown.
研究の動機と目的
- Propose a universal fuzzing approach that works across multiple input languages and features.
- Eliminate the need for language-specific fuzzers by leveraging LLMs for input generation and mutation.
- Automate prompt generation via autoprompting to distill arbitrary user inputs into effective fuzzing prompts.
- Develop an iterative LLM-powered fuzzing loop to produce diverse, feature-rich inputs.
- Evaluate the approach across multiple languages and real-world systems to assess coverage and bug finding.
提案手法
- Introduce autoprompting to distill user-provided documents into a concise fuzzing prompt.
- Use a distillation LLM to generate candidate prompts and a generation LLM to produce fuzzing inputs.
- Score prompts by the validity/coverage potential of generated inputs and select the best prompt.
- Implement an LLM-powered fuzzing loop that appends examples and generation strategies to create new inputs.
- Combine generate-new, mutate-existing, and semantic-equiv strategies to iteratively diversify fuzzing inputs.
- Validate fuzzing outputs against a user-defined oracle (e.g., crashes) without instrumenting the SUT.

実験結果
リサーチクエスチョン
- RQ1How does Fuzz4All compare to existing language-specific fuzzers in terms of code coverage?
- RQ2Can Fuzz4All effectively perform targeted fuzzing focusing on specific features?
- RQ3What is the contribution of autoprompting and the fuzzing loop components to overall effectiveness?
- RQ4What real-world bugs are discovered by Fuzz4All across diverse languages and SUTs?
主な発見
- Fuzz4All achieves higher code coverage across six languages and nine SUTs than state-of-the-art fuzzers, by an average improvement of 36.8%.
- The approach identifies 98 bugs in widely used systems, with 64 bugs already confirmed by developers as previously unknown.
- Fuzz4All supports both general fuzzing and feature-targeted fuzzing driven by user-provided documentation or specifications.
- An autoprompting strategy distills arbitrary user input into effective fuzzing prompts, enabling cross-language applicability.
- An LLM-powered fuzzing loop continually updates prompts with examples and generation strategies to produce diverse inputs.
- The method does not require instrumentation of the SUT, simplifying practical application.

より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。