[Paper Review] Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT
The paper investigates BERT’s robustness to natural keyboard typos by generating adversarial misspellings and evaluating their impact on sentiment analysis and QA, revealing unbalanced sensitivity to typos and task-dependent brittleness.
There is an increasing amount of literature that claims the brittleness of deep neural networks in dealing with adversarial examples that are created maliciously. It is unclear, however, how the models will perform in realistic scenarios where extit{natural rather than malicious} adversarial instances often exist. This work systematically explores the robustness of BERT, the state-of-the-art Transformer-style model in NLP, in dealing with noisy data, particularly mistakes in typing the keyboard, that occur inadvertently. Intensive experiments on sentiment analysis and question answering benchmarks indicate that: (i) Typos in various words of a sentence do not influence equally. The typos in informative words make severer damages; (ii) Mistype is the most damaging factor, compared with inserting, deleting, etc.; (iii) Humans and machines have different focuses on recognizing adversarial attacks.
Motivation & Objective
- Assess robustness of BERT to natural, keyboard-derived typos in realistic input scenarios.
- Identify which word types and typo operations most degrade BERT performance.
- Compare model sensitivity with human readers and across tasks (sentiment analysis and QA).
- Evaluate how subword tokenization and word segmentation affect robustness.
- Provide insights for building more trustworthy NLP systems resilient to typos.
Proposed method
- Generate natural adversarial samples by injecting keyboard typos constrained by keyboard character distributions.
- Use gradient information to select informative words (max-grad), uninformative words (min-grad), or random words as targets for typos.
- Consider six typo types: insertion, deletion, swap, mistype, pronounce, replace-w.
- Limit maximum typos per sentence to K and iteratively search for typos that flip the model prediction.
- Tokenize inputs with BERT’s WordPiece scheme and analyze effects on sentiment and QA tasks.
- Compare BERT performance with RNN baselines using GloVe and character n-gram embeddings to assess the role of character information in robustness.
Experimental results
Research questions
- RQ1How robust is BERT to keyboard-derived typos in real-world textual inputs?
- RQ2Which words (informative vs. uninformative) and which typo types most impact BERT’s predictions?
- RQ3How does robustness vary between sentiment analysis and question answering tasks?
- RQ4To what extent does subword segmentation influence vulnerability to typographical perturbations?
- RQ5How does human readability/comprehension compare to machine robustness under typos?
Key findings
- Typos on informative words cause the largest accuracy drops, and a single typo can significantly reduce performance.
- Mistype is the most damaging typo type among those tested; insertion generally has the least effect due to subword tokenization.
- BERT’s attention to typos is unbalanced; informative words drive more changes than frequent but uninformative words.
- QA (SQuAD) is more brittle to typos than sentiment analysis, indicating task-dependent robustness.
- Humans more readily detect typos in uninformative words, while models focus more on informative words.
- Subword segmentation contributes to robustness differences; models using character information show more resilience than BERT in some setups.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.