QUICK REVIEW

[论文解读] Testing Storage-System Correctness: Challenges, Fuzzing Limitations, and AI-Augmented Opportunities

Ying Wang, Jiahui Chen|arXiv (Cornell University)|Feb 2, 2026

Software System Performance and Reliability被引用 0

一句话总结

本论文综述存储系统正确性测试，概述故障模型，评估跨层的现有技术，检视模糊测试的局限，并探讨AI增强的语义感知验证机会。

ABSTRACT

Storage systems are fundamental to modern computing infrastructures, yet ensuring their correctness remains challenging in practice. Despite decades of research on system testing, many storage-system failures (including durability, ordering, recovery, and consistency violations) remain difficult to expose systematically. This difficulty stems not primarily from insufficient testing tooling, but from intrinsic properties of storage-system execution, including nondeterministic interleavings, long-horizon state evolution, and correctness semantics that span multiple layers and execution phases. This survey adopts a storage-centric view of system testing and organizes existing techniques according to the execution properties and failure mechanisms they target. We review a broad spectrum of approaches, ranging from concurrency testing and long-running workloads to crash-consistency analysis, hardware-level semantic validation, and distributed fault injection, and analyze their fundamental strengths and limitations. Within this framework, we examine fuzzing as an automated testing paradigm, highlighting systematic mismatches between conventional fuzzing assumptions and storage-system semantics, and discuss how recent artificial intelligence advances may complement fuzzing through state-aware and semantic guidance. Overall, this survey provides a unified perspective on storage-system correctness testing and outlines key challenges

研究动机与目标

描述现代存储系统架构与使正确性测试变得复杂的失效模型。
将现有存储系统测试技术按故障类别与基本假设系统化。
分析模糊测试与存储系统语义的一致性及其局限性。
讨论AI驱动的方法如何在验证中提供状态感知与语义引导。
概述一个统一的、以存储为中心的框架来指导未来的语义感知测试。

提出的方法

按执行特性和目标失效机制组织测试技术。
考察五大类失效：时序/起源、状态演化、崩溃-一致性/恢复、硬件/持久化模型，以及分布式协调/副本一致性。
将模糊测试流程分解以评估与存储系统要求的对齐度并识别空白点。
讨论AI增强的测试作为推理状态演化、历史与语义正确性的机制。
提供对测试存储系统正确性的统一视角，并勾勒未来语义感知验证方向。

Figure 1 . Multi-Layer Structure of Modern Storage Systems.

实验结果

研究问题

RQ1跨越分层体系结构的存储系统正确性测试中，哪些失效模式最具挑战性？
RQ2现有测试技术在揭示长时间跨度、跨层和恢复时刻的故障方面有多大效果？
RQ3模糊测试在存储系统测试中的定位是什么，它的基本局限性有哪些？
RQ4如何利用AI提升对存储系统的状态化、语义化验证？
RQ5未来哪些方向能实现对存储栈的统一、语义感知测试框架？

主要发现

由于非确定性介入、长期状态演化以及跨层语义，存储系统正确性测试具有挑战性。
现有技术覆盖多类失效，但在语义引导、长时间跨度覆盖和可扩展性方面受限。
模糊测试往往基于与存储语义不一致的假设，导致对故障的暴露不充分。
AI增强方法能够提供状态感知与语义引导的验证，潜在提升对深层状态与多阶段失效的覆盖。
以存储为中心的视角统一了多种技术，并凸显未来语义感知验证的关键挑战与机遇。

Figure 2 . Four fundamental dimensions of storage-system testing complexity.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。