QUICK REVIEW

[Paper Review] CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Xuejing Yuan, Yuxuan Chen|arXiv (Cornell University)|Jan 24, 2018

Adversarial Robustness in Machine Learning23 references163 citations

TL;DR

The paper presents CommanderSong, a practical method to embed voice commands into songs that are recognizable by ASR systems, enabling over-the-air attacks (WAA) and direct WAV inputs (WTA), with defenses proposed.

ABSTRACT

The popularity of ASR (automatic speech recognition) systems, like Google Voice, Cortana, brings in security concerns, as demonstrated by recent attacks. The impacts of such threats, however, are less clear, since they are either less stealthy (producing noise-like voice commands) or requiring the physical presence of an attack device (using ultrasound). In this paper, we demonstrate that not only are more practical and surreptitious attacks feasible but they can even be automatically constructed. Specifically, we find that the voice commands can be stealthily embedded into songs, which, when played, can effectively control the target system through ASR without being noticed. For this purpose, we developed novel techniques that address a key technical challenge: integrating the commands into a song in a way that can be effectively recognized by ASR through the air, in the presence of background noise, while not being detected by a human listener. Our research shows that this can be done automatically against real world ASR applications. We also demonstrate that such CommanderSongs can be spread through Internet (e.g., YouTube) and radio, potentially affecting millions of ASR users. We further present a new mitigation technique that controls this threat.

Motivation & Objective

Demonstrate practical adversarial attacks against modern DNN-based ASR systems using songs as carriers.
Show that such CommanderSongs can be distributed via online media (e.g., YouTube) and spread to many ASR users.
Develop and evaluate defense mechanisms against CommanderSong attacks.
Assess human perceptibility of CommanderSongs and transferability to different ASR platforms.

Proposed method

Use Kaldi ASR as the target to study the attack pipeline.
Craft adversarial audio by aligning pdf-id sequences via gradient descent to minimize perturbation while enabling command decoding.
Define a pdf-id sequence matching objective to minimize L1 distance between the DNN posterior outputs of the original song and the target command.
Integrate a generic noise model to simulate speaker and recording receiver noise for over-the-air attacks.
Incorporate random noise to improve robustness across speakers and receivers for WAA attacks.
Evaluate WTA and WAA attacks across multiple commands and songs, and perform a human perceptibility survey.

Experimental results

Research questions

RQ1Is it possible to build a practical adversarial attack against ASR systems that works in real-world acoustic environments?
RQ2Can adversarial audio be stealthy enough to be unnoticed by humans while being recognized by ASR?
RQ3Can such adversarial samples be delivered remotely and affect a large number of devices via online media?
RQ4What defenses can mitigate CommanderSong attacks against current ASR systems?

Key findings

CommanderSong achieved 100% success in decoding injected commands in Kaldi for WTA attacks across tested commands.
WAA attacks reached up to 96% success against a pseudo IVC device using a JBL speaker, with SNRs below 2 dB in air settings.
Average SNR for WTA attacks ranged 14–18.6 dB, indicating perturbations under 4% while maintaining high recognition rate.
CommanderSongs demonstrated transferability to iFLYTEK in black-box scenarios (no code/model access).
A human study via MTurk suggested that participants did not identify commands embedded in CommanderSongs.
Two defense approaches—audio turbulence and audio squeezing—showed effectiveness against the attack.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.