[Paper Review] Adversarial Robustness for Code
The paper introduces a framework to build accurate and adversarially robust code models by abstaining on uncertain predictions, applying adversarial training in the code domain, and refining input representations; applied to type inference, it achieves notable robustness gains with maintained accuracy.
Machine learning and deep learning in particular has been recently used to successfully address many tasks in the domain of code such as finding and fixing bugs, code completion, decompilation, type inference and many others. However, the issue of adversarial robustness of models for code has gone largely unnoticed. In this work, we explore this issue by: (i) instantiating adversarial attacks for code (a domain with discrete and highly structured inputs), (ii) showing that, similar to other domains, neural models for code are vulnerable to adversarial attacks, and (iii) combining existing and novel techniques to improve robustness while preserving high accuracy.
Motivation & Objective
- Motivate and address robustness gaps in neural models of code across tasks like type inference and code analysis.
- Propose a three-component framework combining abstention, adversarial training, and representation refinement.
- Demonstrate that robustness can be improved while preserving or enhancing accuracy on code tasks.
Proposed method
- Train a model with an abstention mechanism to separate predictions into confident and abstain regions.
- Instantiate adversarial training by considering valid program modifications and optimizing worst-case loss.
- Refine input representations by learning a program-specific abstraction that retains only the parts relevant to the prediction.
- Train multiple specialized models, each learning a representation tailored to a subset of the data, to improve overall robustness and coverage.
- Represent programs as graphs and use graph neural networks; formulate refinement as an edge-sparsification problem solvable via ILP.
- Iteratively annotate data with robust predictions to further expand accurate and robust coverage.
Experimental results
Research questions
- RQ1Can abstention improve robustness without sacrificing overall accuracy in code models?
- RQ2How much can adversarial training improve robustness for code tasks, and is it sufficient alone?
- RQ3Does learning a refined, task-relevant representation (abstraction) enhance robustness to adversarial program changes?
- RQ4Does training multiple specialized models increase robust coverage of a dataset compared to a single model?
Key findings
| Model | Accuracy | Robustness |
|---|---|---|
| LSTM | 88.2 ± 0.2 | 44.9 ± 1.3 |
| DeepTyper | 88.4 ± 0.2 | 52.4 ± 1.2 |
| GCN | 82.6 ± 0.6 | 49.1 ± 1.1 |
| GNT | 89.3 ± 0.9 | 47.4 ± 1.0 |
| GGNN | 86.7 ± 0.4 | 52.1 ± 0.4 |
| LSTM (adv trained) | 87.5 ± 0.4 | 51.9 ± 1.3 |
| DeepTyper (adv trained) | 87.1 ± 0.3 | 55.1 ± 2.6 |
| GCN (adv trained) | 81.9 ± 0.5 | 49.3 ± 3.1 |
| GNT (adv trained) | 88.3 ± 0.4 | 50.0 ± 0.5 |
| GGNN (adv trained) | 86.1 ± 0.2 | 57.9 ± 1.5 |
| GNT (abstain+adv+refinement, t_acc=1.00) | 99.93% | 99.98% |
| GGNN (abstain+adv+refinement, t_acc=1.00) | 99.80% | 99.01% |
| GNT (abstain+adv+refinement, t_acc=0.00) | 86.6% | 62.3% |
| GGNN (abstain+adv+refinement, t_acc=0.00) | 87.7% | 67.0% |
- For type prediction in JavaScript/TypeScript, the approach achieves 87.7% accuracy with robustness improved from 52.1% to 67.0%.
- With abstention, adversarial training, and representation refinement, some models reach near-perfect robust performance on subsets of data (e.g., 99.93% accuracy and 99.98% robustness for 29% of samples under t_acc=1.00).
- Training multiple robust models improves coverage: approximately 30% robustly accurate on a subset at t_acc=1.00 for GNT (99.98% robustness); GGNN achieves 99.01% robustness on the same setting.
- Adversarial training alone yields limited robustness gains (up to ~7% in some models).
- Representation refinement via learned abstractions and ILP-based edge pruning substantially enhances robustness when combined with abstention and adversarial training.
- The approach demonstrates robustness-accuracy tradeoffs and highlights the benefit of decomposing the problem into abstain, adversarial training, and representation refinement.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.