[论文解读] Adversarial Patch Generation for Visual-Infrared Dense Prediction Tasks via Joint Position-Color Optimization
该论文提出 AP-PCO,一种黑盒、联合位置-颜色优化框架,通过跨模态颜色复用来攻击 VI 任务的可视化-红外密集预测,进而进化补丁位置与颜色以产生对人群计数、语义分割与图像融合等任务的对抗性补丁。
Multimodal adversarial attacks for dense prediction remain largely underexplored. In particular, visual-infrared (VI) perception systems introduce unique challenges due to heterogeneous spectral characteristics and modality-specific intensity distributions. Existing adversarial patch methods are primarily designed for single-modal inputs and fail to account for crossspectral inconsistencies, leading to reduced attack effectiveness and poor stealthiness when applied to VI dense prediction models. To address these challenges, we propose a joint position-color optimization framework (AP-PCO) for generating adversarial patches in visual-infrared settings. The proposed method optimizes patch placement and color composition simultaneously using a fitness function derived from model outputs, enabling a single patch to perturb both visible and infrared modalities. To further bridge spectral discrepancies, we introduce a crossmodal color adaptation strategy that constrains patch appearance according to infrared grayscale characteristics while maintaining strong perturbations in the visible domain, thereby reducing cross-spectral saliency. The optimization procedure operates without requiring internal model information, supporting flexible black-box attacks. Extensive experiments on visual-infrared dense prediction tasks demonstrate that the proposed AP-PCO achieves consistently strong attack performance across multiple architectures, providing a practical benchmark for robustness evaluation in VI perception systems.
研究动机与目标
- Motivate the study of adversarial robustness for visual–infrared (VI) dense prediction tasks.
- Develop a patch-based attack that works across both visible and infrared modalities.
- Propose a joint spatial-spectral optimization framework that does not rely on internal model gradients.
- Introduce a cross-modal color reuse strategy to improve stealthiness in VI settings.
- Evaluate attack effectiveness and stealthiness across multiple VI tasks and models.
提出的方法
- 将 VI 补丁攻击表述为对补丁位置 (x, y, r) 与颜色参数的联合优化,通过基于群体的全局搜索(微分进化)进行优化.
- 定义统一的二值掩模 M(x,y,r) 将补丁内容嵌入可见与红外输入,且采用模态特异的颜色应用。
- 使用适应度函数 J = α E(Xadv) + (1−α) S(Xadv) 来平衡攻击有效性 E 与隐蔽性 S,E 的定义因任务而异(如 GAME/RMSE、mIoU、融合度量),S 的定义为 SSIM/PSNR。
- 采用跨模态颜色参数复用策略:在可见域应用高亮度颜色;在红外域将其转换为灰度并压缩强度以与红外外观融合。
- 将补丁位置与颜色参数参数化为一个向量,用于联合优化,在混合离散-连续的搜索空间中实现协调探索。
- 展示空间维度与光谱维度之间的耦合性,并论证在 VI 密集预测中需要联合优化的合理性。
实验结果
研究问题
- RQ1在不访问内部模型的前提下,黑盒对抗性补丁攻击是否能够有效扰动 VI 密集预测模型?
- RQ2与固定或逐步优化相比,补丁位置和颜色的联合优化是否提高了可见–红外任务的攻击性能和隐蔽性?
- RQ3跨模态颜色复用策略是否在保持可见域强扰动的同时减少红外域的显著性?
- RQ4所提攻击是否在多种 VI 密集预测任务(人群计数、语义分割、图像融合)和多种架构上具有泛化性?
- RQ5权衡参数 α 对攻击有效性与隐蔽性有何影响?
主要发现
| Setting | GAME(0) | GAME(1) | GAME(2) | GAME(3) | RMSE | PSNR_RGB | SSIM_RGB | PSNR_T | SSIM_T |
|---|---|---|---|---|---|---|---|---|---|
| Clean | 13.7001 | 18.3601 | 22.1256 | 28.6380 | 24.4166 | - | - | - | - |
| PAP | 14.9798 | 20.9912 | 26.6463 | 33.4813 | 25.1726 | 23.5981 | 0.9768 | 23.7474 | 0.9747 |
| AP-AM | 14.6624 | 19.1693 | 23.1393 | 30.0921 | 25.9723 | 28.0505 | 0.9822 | 26.7175 | 0.9738 |
| AP-PCO (Ours) | 40.5543 | 51.2453 | 56.7172 | 63.6817 | 45.1786 | 25.6450 | 0.9832 | 28.2151 | 0.9850 |
- AP-PCO 在 VI 任务上显著高于基线的单模态或非联合补丁的攻击有效性(如表 I 中的指标,AP-PCO 超越其他方案)。
- 使用面向群体搜索的联合位置-颜色优化在黑盒设置下能实现鲁棒攻击,且不需要内部模型信息。
- 跨模态颜色复用在红外域提升隐蔽性,同时在可见域保持扰动强度,减少跨光谱伪像。
- 在三个 VI 密集预测任务(人群计数、语义分割、图像融合)上的实验表明,在多种架构与防御下具有一致的攻击性能。
- 该方法为评估 VI 感知系统鲁棒性提供了一个切实可行的基准。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。