QUICK REVIEW

[论文解读] SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models

Xiyang Wu, Guangyao Shi|arXiv (Cornell University)|Mar 26, 2026

Adversarial Robustness in Machine Learning被引用 0

一句话总结

SABER 引入以代理为中心的黑盒框架，利用 GRPO 训练的 ReAct 攻击者，在有预算限制的情况下，自动生成小型、可信的指令编辑，以降解 visions-language-action (VLA) 机器人策略。它在 LIBERO 基准上实现了面向目标的降级，所需编辑和工具调用更少，相较于基于 GPT 的基线。

ABSTRACT

Vision-language-action (VLA) models enable robots to follow natural-language instructions grounded in visual observations, but the instruction channel also introduces a critical vulnerability: small textual perturbations can alter downstream robot behavior. Systematic robustness evaluation therefore requires a black-box attacker that can generate minimal yet effective instruction edits across diverse VLA models. To this end, we present SABER, an agent-centric approach for automatically generating instruction-based adversarial attacks on VLA models under bounded edit budgets. SABER uses a GRPO-trained ReAct attacker to generate small, plausible adversarial instruction edits using character-, token-, and prompt-level tools under a bounded edit budget that induces targeted behavioral degradation, including task failure, unnecessarily long execution, and increased constraint violations. On the LIBERO benchmark across six state-of-the-art VLA models, SABER reduces task success by 20.6%, increases action-sequence length by 55%, and raises constraint violations by 33%, while requiring 21.1% fewer tool calls and 54.7% fewer character edits than strong GPT-based baselines. These results show that small, plausible instruction edits are sufficient to substantially degrade robot execution, and that an agentic black-box pipeline offers a practical, scalable, and adaptive approach for red-teaming robotic foundation models.

研究动机与目标

需要自动化、通用的黑盒攻击者用于 VLA 系统以压力测试机器人基础模型的动机。
开发一个以代理为导向的攻击流程，能够在对目标 VLA 无梯度访问的情况下，组合受限的指令级扰动。
展示学习攻击者在不同 VLA 模型和任务上的可转移性与效率。
在现实预算下，量化任务成功率、行动长度与约束违规的降解程度。

提出的方法

将指令扰动 formulate 为跨令牌、跨字符和提示级工具的 Find-Apply 两阶段协议。
使用 GRPO 训练的 ReAct agent 在有预算限制下生成多轮扰动（令牌编辑、字符编辑、工具调用）。
以带有隐蔽性惩罚的 rollout 为目标进行优化，平衡攻击有效性和扰动可见性（J_atk = E[R_O(δ;τ) - λ P_stealth(δ)]）。
在对冻结的 VLA 策略进行黑盒 rollout 反馈的基础上，通过 LoRA 微调进行训练（GRPO + SFT 引导）。
以代理化的红队循环运作，无需对目标 VLA 或环境进行梯度传递。

Figure 1 : SABER: An agent-centric black-box pipeline for stealthy, automated instruction-based attacks on VLAs. VLA models for robot manipulation are expected to achieve high task success, efficient action planning and execution, and safe behavior under physical constraints. However, even small ins

实验结果

研究问题

RQ1自动化黑盒攻击者是否能够在多样的 VLA 模型和任务中生成有效的指令编辑？
RQ2在有预算约束的情况下，代理引导的扰动是否比基于 GPT 的基线更高效或更强？
RQ3不同扰动粒度（字符、令牌、提示）如何影响攻击效果与隐蔽性？
RQ4在具有不同推理能力的 VLA 模型之间，学习到的扰动策略的可转移性如何？

主要发现

SABER 在 LIBERO 任务中实现了稳定的面向目标的降级：平均任务成功率下降 20.6%，行动序列增加 55%，约束违规上升 33%。
与强基线 GPT 相比，SABER 减少工具调用 21.1%，字符编辑减少 54.7%，同时实现相当或更好的攻击性能。
攻击策略从更广的提示级编辑演变为高杠杆的令牌级编辑，在对若干目标进行 GRPO 微调后，令牌编辑成为主导。
通过在 GRPO 之前进行有监督微调（SFT）实现冷启动，对稳定的强化学习训练和有效的攻击策略发现至关重要。
SABER 不需要对目标 VLA 的梯度访问，仍能实现对未见目标和任务的强转移。
与同接口下的冻结 GPT-5 小型攻击器相比，SABER 更高效且更具隐蔽性，同时保持了竞争性的目标性能。

Figure 2 : Overview of SABER. For each LIBERO task, we maintain two contrastive rollouts under a frozen target VLA. A clean baseline rollout ( Green Box ) is first executed and cached as reference. For the attack rollout, the instruction is passed to a red-team agent ( Red Box ), which uses an LLM b

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。