QUICK REVIEW

[論文レビュー] Diffusion-Guided Backdoor Attacks in Real-World Reinforcement Learning

Tairan Huang, Qingqing Ye|arXiv (Cornell University)|Jan 20, 2026

Adversarial Robustness in Machine Learning被引用数 0

ひとこと要約

DGBA は拡散生成のプリンタブル床 patches と advantage-based poisoning を用いて、安全性制約下の現実世界の RL における標的バックドア挙動を活性化し、現実 TurtleBot3 展開で既存の RL バックドア手法を上回る。

ABSTRACT

Backdoor attacks embed hidden malicious behaviors in reinforcement learning (RL) policies and activate them using triggers at test time. Most existing attacks are validated only in simulation, while their effectiveness in real-world robotic systems remains unclear. In physical deployment, safety-constrained control pipelines such as velocity limiting, action smoothing, and collision avoidance suppress abnormal actions, causing strong attenuation of conventional backdoor attacks. We study this previously overlooked problem and propose a diffusion-guided backdoor attack framework (DGBA) for real-world RL. We design small printable visual patch triggers placed on the floor and generate them using a conditional diffusion model that produces diverse patch appearances under real-world visual variations. We treat the robot control stack as a black-box system. We further introduce an advantage-based poisoning strategy that injects triggers only at decision-critical training states. We evaluate our method on a TurtleBot3 mobile robot and demonstrate reliable activation of targeted attacks while preserving normal task performance. Demo videos and code are available in the supplementary material.

研究の動機と目的

Identify how safety-constrained control stacks attenuate conventional RL backdoor attacks in real-world deployment.
Propose a diffusion-guided backdoor framework that exploits perception-level triggers to remain effective under real-world variations.
Show that diffusion-generated patches and targeted poisoning outperform existing attacks in real-world experiments.

提案手法

Use a small printable floor patch as trigger at the perception level.
Generate patch appearances with a conditional diffusion model to handle real-world visual variation.
Augment diffusion samples with physical-style transformations to bridge sim-to-real gaps.
Apply advantage-based poisoning to inject triggers only at decision-critical training states.
Treat the safety-constrained control stack as a black box and optimize for target behavior after control filtering.
Train and deploy in a three-stage pipeline: simulate clean PPO, finetune with sparse diffusion-trigger poisoning, and test on TurtleBot3 with the real controller.

実験結果

リサーチクエスチョン

RQ1Can backdoor triggers remain effective when a safety-constrained control stack attenuates abnormal actions in real robots?
RQ2Does diffusion-based trigger generation provide robust activation under real-world visual variations compared to fixed patches?
RQ3Does concentrating poisoning on decision-critical states improve attack efficiency under a limited poisoning budget?
RQ4 Can the attack generalize across RL algorithms (PPO and TRPO) in real-world deployment?

主な発見

Method	CSR (%)	ASR (%)
Clean PPO (no attack)	91.1	-
TrojDRL Kiourti et al. (2019)	85.6	34.5
BadRL Cui et al. (2024)	87.3	57.0
SleeperNets Rathbun et al. (2024)	88.7	21.3
DGBA (ours)	89.1	83.5

DGBA achieves high attack success in real-world TurtleBot3 tests while preserving clean-task performance.
DGBA attains ASR of 83.5% with CSR of 89.1% on PPO victims, outperforming baselines.
DGBA outperforms TrojDRL, BadRL, and SleeperNets in ASR under safety-constrained deployment.
Ablations show diffusion and physical-style augmentation are critical for high ASR and stable activation.
Attack effectiveness persists under cross-algorithm (TRPO) evaluation, with DGBA achieving the highest ASR (76.3%).
Higher poisoning rates can increase ASR but may reduce CSR, indicating a trade-off.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。