QUICK REVIEW

[论文解读] Evaluating Voice Conversion-based Privacy Protection against Informed Attackers

Brij Mohan Lal Srivastava, Nathalie Vauquier|arXiv (Cornell University)|Nov 10, 2019

Speech Recognition and Synthesis参考文献 25被引用 76

一句话总结

本文评估基于语音转换的去匿名化在面对不同知识水平的攻击者时的保护力度，结果显示完全知情攻击者在很大程度上会击败隐私保护；而部分知识攻击者（半知情）可通过某些目标选择策略得到缓解；不知情攻击者则看到强 unlinkability。

ABSTRACT

Speech data conveys sensitive speaker attributes like identity or accent. With a small amount of found data, such attributes can be inferred and exploited for malicious purposes: voice cloning, spoofing, etc. Anonymization aims to make the data unlinkable, i.e., ensure that no utterance can be linked to its original speaker. In this paper, we investigate anonymization methods based on voice conversion. In contrast to prior work, we argue that various linkage attacks can be designed depending on the attackers' knowledge about the anonymization scheme. We compare two frequency warping-based conversion methods and a deep learning based method in three attack scenarios. The utility of converted speech is measured via the word error rate achieved by automatic speech recognition, while privacy protection is assessed by the increase in equal error rate achieved by state-of-the-art i-vector or x-vector based speaker verification. Our results show that voice conversion schemes are unable to effectively protect against an attacker that has extensive knowledge of the type of conversion and how it has been applied, but may provide some protection against less knowledgeable attackers.

研究动机与目标

在不同攻击者知识水平下评估语音转换（VC）匿名化的不可链接性。
在不同目标选择策略下比较三种VC方法（VoiceMask、基于VTLN 的VC、以及解耦表示VC）。
通过在转换语音上测量说话人验证的EER和ASR的WER来量化隐私与效用。
形式化威胁模型并为隐私保护的语音处理设计提供指南。

提出的方法

评估三种非并行、多人对多人、源语言与语言无关的VC方法：VoiceMask、基于VTLN 的VC，以及解耦表示VC。
定义三种目标选择策略：const（固定目标）、perm（对每个用户的随机目标）、random（对每次话语的随机目标）。
定义攻击者知识水平：Ignorant、Semi-Informed、Informed，关于VC方法及参数。
通过对转换数据进行基于i-vector/x-vector的说话人验证的EER以及转换数据上的ASR WER来评估不可链接性。
在LibriSpeech上训练x-vector和i-vector系统；使用在转换数据上训练的混合CTC/Attention模型来评估ASR。

实验结果

研究问题

RQ1在不同VC方法和目标选择策略下，随着攻击者知识水平（Ignorant、Semi-Informed、Informed）的变化，unlinkability 如何变化？
RQ2在现实的攻击者知识水平下，哪种目标选择策略对隐私保护最好？
RQ3对于每种方法，VC 对下游ASR性能（WER）和说话人验证指标（EER）的影响如何？

主要发现

知情攻击者在某些VC方法上所获得的EER与基线相似甚至更低，表明在攻击者对VC方案和目标具有完全知识时隐私保护有限。
半知情攻击者获得显著的隐私保护，置换策略（perm）在策略中通常提供最强的不可链接性。
不知情攻击者显示出强不可链接性，因为他们不知道已应用VC，因此保护更强。
在适当目标选择策略下，基于VTLN的VC对部分知识的链路攻击提供合理的隐私保护，而VoiceMask在知情知识下更脆弱。
解耦表示VC在评估设置下导致较大WER增加，表示效用较差，尽管其隐私特性随攻击者知识和目标策略而变化。
未变换数据的基线EER：i-vector 4.61%，x-vector 4.31%；ASR WER基线 9.4%。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。