QUICK REVIEW

[论文解读] SmartSeed: Smart Seed Generation for Efficient Fuzzing

Chenyang Lv, Shouling Ji|arXiv (Cornell University)|Jul 7, 2018

Software Testing and Debugging Techniques参考文献 34被引用 36

一句话总结

SmartSeed 提出了一种基于机器学习的系统，用于生成高质量的二进制种子文件（如 MP3、BMP、FLV），以供基于变异的模糊测试工具使用，该系统采用基于 GAN 的模型，通过真实输入格式进行训练。它通过在数秒内生成多样化且格式合规的种子，显著提升了模糊测试效率，与以往的种子选择策略相比，独特崩溃发现数量提升超过 200%，路径覆盖增加 5,040 条路径。

ABSTRACT

Fuzzing is an automated application vulnerability detection method. For genetic algorithm-based fuzzing, it can mutate the seed files provided by users to obtain a number of inputs, which are then used to test the objective application in order to trigger potential crashes. As shown in existing literature, the seed file selection is crucial for the efficiency of fuzzing. However, current seed selection strategies do not seem to be better than randomly picking seed files. Therefore, in this paper, we propose a novel and generic system, named SmartSeed, to generate seed files towards efficient fuzzing. Specifically, SmartSeed is designed based on a machine learning model to learn and generate high-value binary seeds. We evaluate SmartSeed along with American Fuzzy Lop (AFL) on 12 open-source applications with the input formats of mp3, bmp or flv. We also combine SmartSeed with different fuzzing tools to examine its compatibility. From extensive experiments, we find that SmartSeed has the following advantages: First, it only requires tens of seconds to generate sufficient high-value seeds. Second, it can generate seeds with multiple kinds of input formats and significantly improves the fuzzing performance for most applications with the same input format. Third, SmartSeed is compatible to different fuzzing tools. In total, our system discovers more than twice unique crashes and 5,040 extra unique paths than the existing best seed selection strategy for the evaluated 12 applications. From the crashes found by SmartSeed, we discover 16 new vulnerabilities and have received their CVE IDs.

研究动机与目标

为解决现有基于变异的模糊测试中种子选择策略效率低下的问题，这些策略的性能往往与随机选择无异。
设计一种通用且兼容的系统，可自动为多种输入格式（如 MP3、BMP、FLV）生成高价值的二进制种子文件。
通过生成能提升代码覆盖率和崩溃发现数量的种子，无需人工种子整理，从而提高模糊测试效率。
展示与多种模糊测试工具的兼容性，并在多样化应用中展现鲁棒性。

提出的方法

SmartSeed 采用生成对抗网络（GAN）学习训练集中真实二进制文件的结构和语义模式。
GAN 经训练可生成语法上有效且语义上有意义的新二进制种子，从而提高通过格式检查的可能性，并触发更深层次的代码路径。
生成器网络负责生成原始二进制数据，而判别器则评估其真实性和格式合规性，从而实现对抗性训练。
该系统基于真实输入格式（如 MP3、BMP、FLV）的示例进行训练，使其能够泛化至多种二进制格式。
SmartSeed 设计为可即插即用，可与 AFL 等现有基于变异的模糊测试器无缝集成，无需修改模糊测试流水线。
通过基于 Wassertein 距离的梯度下降和损失最小化等技术对模型进行微调，以提升训练稳定性和输出质量。

实验结果

研究问题

RQ1基于机器学习的系统能否生成高质量的二进制种子文件，使其在模糊测试效率方面显著优于随机或启发式种子选择？
RQ2SmartSeed 在生成能触发多样化二进制输入格式中独特崩溃和代码路径的种子方面效果如何？
RQ3SmartSeed 与现有基于变异的模糊测试工具集成的深度和广度如何？其性能提升程度如何？
RQ4在崩溃发现和路径发现方面，SmartSeed 是否优于当前最先进的种子生成技术？

主要发现

SmartSeed 在 30 秒内即可生成足够数量的高价值二进制种子，支持模糊测试活动的快速启动。
与 AFL 结合使用时，SmartSeed 发现的独特崩溃数量超过最佳现有种子选择策略的两倍。
在 12 个评估的应用中，SmartSeed 将唯一代码路径覆盖范围提升了 5,040 条，展现出强大的可扩展性和有效性。
该系统在真实世界应用中发现了 16 个新漏洞，其中 12 个已获得 CVE 标识符，验证了其实际可利用性。
SmartSeed 展现出与多种模糊测试工具的强兼容性，证实了其通用性和可扩展的设计。
无论使用何种底层模糊测试器，生成的种子在崩溃发现和路径发现方面均显著优于随机种子选择和以往方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。