Skip to main content
QUICK REVIEW

[论文解读] Stealthy Backdoor Attack for Code Models

Zhou Yang, Bowen Xu|arXiv (Cornell University)|Jan 6, 2023
Software Engineering Research被引用 16
一句话总结

Afraidoor 是一种针对代码模型的隐蔽后门攻击,它使用自适应对抗性令牌重命名来注入触发器,在代码摘要和方法名预测任务上对 CodeBERT、PLBART、CodeT5 进行评估,防御措施在某些情况下部分无效。

ABSTRACT

Code models, such as CodeBERT and CodeT5, offer general-purpose representations of code and play a vital role in supporting downstream automated software engineering tasks. Most recently, code models were revealed to be vulnerable to backdoor attacks. A code model that is backdoor-attacked can behave normally on clean examples but will produce pre-defined malicious outputs on examples injected with triggers that activate the backdoors. Existing backdoor attacks on code models use unstealthy and easy-to-detect triggers. This paper aims to investigate the vulnerability of code models with stealthy backdoor attacks. To this end, we propose AFRAIDOOR (Adversarial Feature as Adaptive Backdoor). AFRAIDOOR achieves stealthiness by leveraging adversarial perturbations to inject adaptive triggers into different inputs. We evaluate AFRAIDOOR on three widely adopted code models (CodeBERT, PLBART and CodeT5) and two downstream tasks (code summarization and method name prediction). We find that around 85% of adaptive triggers in AFRAIDOOR bypass the detection in the defense process. By contrast, only less than 12% of the triggers from previous work bypass the defense. When the defense method is not applied, both AFRAIDOOR and baselines have almost perfect attack success rates. However, once a defense is applied, the success rates of baselines decrease dramatically to 10.47% and 12.06%, while the success rate of AFRAIDOOR are 77.05% and 92.98% on the two tasks. Our finding exposes security weaknesses in code models under stealthy backdoor attacks and shows that the state-of-the-art defense method cannot provide sufficient protection. We call for more research efforts in understanding security threats to code models and developing more effective countermeasures.

研究动机与目标

  • 提升对代码模型的安全性关注,以及其对后门攻击的易受攻击性。
  • 提出一种隐蔽的后门方法,通过自适应对抗触发器来保持程序语义。
  • 在多种代码模型和下游任务上评估该攻击,并在多项防御下与现有基线进行比较。

提出的方法

  • 介绍 Afraidoor,一种利用对抗性扰动注入自适应触发器的隐蔽后门方法。
  • 使用标识符重命名作为令牌级触发器以保持代码语义并实现隐蔽性。
  • 通过在干净数据上训练的制作模型来构造定向后门,然后通过基于梯度的优化生成自适应触发器。
  • 通过插入触发器并将标签重新标注为目标 τ 来污染数据集,训练被污染的模型 Mb。
  • 在推理时应用相同的触发器插入器 I(·) 来强制输出 τ。
  • 在三种防御(谱特征、ONION、激活聚类)以及通过用户研究进行评估。
Figure 1: Examples of the adaptive, fixed and grammatical triggers. The changes made to the original function are highlighted in yellow.
Figure 1: Examples of the adaptive, fixed and grammatical triggers. The changes made to the original function are highlighted in yellow.

实验结果

研究问题

  • RQ1在不同任务和模型中,隐蔽的自适应触发器在代码模型上的效果有多强?
  • RQ2自适应后门能否经受住最先进的防御措施和数据清洗训练?
  • RQ3人类是否像自动检测器一样容易检测到隐蔽的后门触发器?

主要发现

TaskModelBLEUInput lengthOutput length
Method PredictionCodeBERT43.351242
Method PredictionPLBART42.511242
Method PredictionCodeT546.041242
Code SummarizationCodeBERT17.5012911
Code SummarizationPLBART18.3512911
Code SummarizationCodeT518.6112911
  • Afraidoor 的自适应触发器在防御下仍然非常有效,在某些设置中对光谱特征的绕过率约为 85%(相比于先前工作不足 12%)。
  • 在防御下,Ramakrishnan 等人的基线攻击表现急剧下降,而 Afraidoor 在两项任务上仍保持较高的攻击成功率。
  • 激活聚类和光谱特征防御对 Afraidoor 与基线的效果不尽相同,定性结果在隐蔽性方面更有利于 Afraidoor。
  • 用户研究表明 Afraidoor 被污染的示例比基线更难识别且耗时更长,表明对人类的隐蔽性更高。
Figure 2: The threat model of backdoor attacks on code models.
Figure 2: The threat model of backdoor attacks on code models.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。