Skip to main content
QUICK REVIEW

[论文解读] SETA: Statistical Fault Attribution for Compound AI Systems

Sayak Ray Chowdhury, Meenakshi D'Souza|arXiv (Cornell University)|Jan 27, 2026
Adversarial Robustness in Machine Learning被引用 0
一句话总结

SETA 引入一个将变形测试与执行轨迹分析相结合的模块化鲁棒性测试框架,以在多组件 AI 流水线中定位故障并将失败归因于特定模块。

ABSTRACT

Modern AI systems increasingly comprise multiple interconnected neural networks to tackle complex inference tasks. Testing such systems for robustness and safety entails significant challenges. Current state-of-the-art robustness testing techniques, whether black-box or white-box, have been proposed and implemented for single-network models and do not scale well to multi-network pipelines. We propose a modular robustness testing framework that applies a given set of perturbations to test data. Our testing framework supports (1) a component-wise system analysis to isolate errors and (2) reasoning about error propagation across the neural network modules. The testing framework is architecture and modality agnostic and can be applied across domains. We apply the framework to a real-world autonomous rail inspection system composed of multiple deep networks and successfully demonstrate how our approach enables fine-grained robustness analysis beyond conventional end-to-end metrics.

研究动机与目标

  • Motivation: compound AI systems amplify debugging challenges due to cascading failures across modules.
  • Goal: enable fine-grained robustness analysis and fault localization within multi-network pipelines.
  • Aim: attribute system-level failures to specific components using metamorphic relations and execution traces.

提出的方法

  • Define a modular framework that applies perturbations to test data and analyzes per-component metamorphic relations.
  • Aggregate component checks into Composite Metamorphic Relations and a system-wide composite score to diagnose correctness.
  • Instrument execution traces as a state- transition graph to localize faults through dynamic profiling.
  • Introduce a statistical fault attribution (FC score) to quantify each module's contribution to end-to-end failures and normalize to attribution weights.
  • Provide a practical instantiation with a railway vision system consisting of an object detector and multiple classifiers, using perturbations to reveal vulnerabilities.
Figure 1. input image
Figure 1. input image

实验结果

研究问题

  • RQ1How can metamorphic relations be defined for each component in a compound AI system to serve as pseudo-oracles?
  • RQ2How can execution traces be used to attribute system-level failures to specific modules in a multi-network pipeline?
  • RQ3Can a statistical attribution framework isolate the root causes of failures beyond end-to-end metrics?
  • RQ4What is the process to compute and normalize component-level contributions to failures across perturbations?

主要发现

  • SETA can localize failure origins and surface hidden vulnerabilities in multi-stage AI pipelines beyond end-to-end metrics.
  • The framework combines per-component metamorphic testing with execution traces to attribute faults to specific modules.
  • A statistical Failure Contribution score and normalized attribution weights quantify each module’s relative impact on system unreliability.
  • The approach is demonstrated on a vision system for autonomous railway maintenance, illustrating causal fault propagation.
  • Metamorphic relations enable oracle-free behavioral specifications suitable for black-box models in complex systems.
Figure 11. A variety of weather and noise based ( zoom_blur , glass_blur , snow , frost , fog and gaussian_noise ) perturbations have been utilized to generate an extended synthetic test dataset that tests the autonomous railway system for robustness against weather.
Figure 11. A variety of weather and noise based ( zoom_blur , glass_blur , snow , frost , fog and gaussian_noise ) perturbations have been utilized to generate an extended synthetic test dataset that tests the autonomous railway system for robustness against weather.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。