QUICK REVIEW

[论文解读] VoiceMask: Anonymize and Sanitize Voice Input on Mobile Devices

Jianwei Qian, Haohua Du|arXiv (Cornell University)|Nov 30, 2017

Speech Recognition and Synthesis参考文献 23被引用 25

一句话总结

VoiceMask 是一种轻量级、基于设备的语音净化系统，通过在将数据发送至基于云的语音识别服务前，应用稳健的语音转换和基于进化算法的关键词替换，对移动设备上的用户语音输入进行匿名化处理。该系统将语音识别概率降低 84%，同时将语音识别准确率的损失控制在 14.2% 以内，有效保护了用户身份和内容隐私。

ABSTRACT

Voice input has been tremendously improving the user experience of mobile devices by freeing our hands from typing on the small screen. Speech recognition is the key technology that powers voice input, and it is usually outsourced to the cloud for the best performance. However, the cloud might compromise users' privacy by identifying their identities by voice, learning their sensitive input content via speech recognition, and then profiling the mobile users based on the content. In this paper, we design an intermediate between users and the cloud, named VoiceMask, to sanitize users' voice data before sending it to the cloud for speech recognition. We analyze the potential privacy risks and aim to protect users' identities and sensitive input content from being disclosed to the cloud. VoiceMask adopts a carefully designed voice conversion mechanism that is resistant to several attacks. Meanwhile, it utilizes an evolution-based keyword substitution technique to sanitize the voice input content. The two sanitization phases are all performed in the resource-limited mobile device while still maintaining the usability and accuracy of the cloud-supported speech recognition service. We implement the voice sanitizer on Android systems and present extensive experimental results that validate the effectiveness and efficiency of our app. It is demonstrated that we are able to reduce the chance of a user's voice being identified from 50 people by 84% while keeping the drop of speech recognition accuracy within 14.2%.

研究动机与目标

解决基于云的语音输入系统中语音生物特征可能被用于重新识别用户所带来的隐私风险。
保护语音输入中敏感内容不被云服务利用自然语言处理技术进行用户画像分析。
通过防止云服务访问原始语音数据，防范语音欺骗和冒名顶替攻击。
在数据净化的前提下，保持基于云的语音识别的可用性和准确性。
提供一种实用的、可在资源受限的移动平台高效运行的设备端解决方案。

提出的方法

采用精心设计的语音转换机制，在保留语音内容和可理解性的同时改变说话人特征。
使用基于进化的关键词替换技术，将语音输入中的敏感关键词替换为语义相似但已匿名化的替代词。
在移动设备上完全执行两个净化阶段——语音匿名化和内容净化，避免原始数据暴露给云服务。
与现有基于云的语音识别流程无缝集成，无需修改云服务。
采用基于布隆过滤器的混淆技术（PRAKA），确保元数据泄露符合差分隐私要求。
采用安全且轻量的架构，确保在移动设备上计算开销极低。

实验结果

研究问题

RQ1是否可以在移动设备上对语音输入进行净化，以防止通过说话人识别重新识别用户身份？
RQ2在不显著降低语音识别准确率的前提下，语音输入中的敏感内容能在多大程度上得到保护？
RQ3结合语音转换与关键词替换的混合方法在保护隐私和保持可用性方面有多高效？
RQ4在强隐私保障条件下，系统能否维持可接受的语音识别性能？
RQ5在真实移动设备部署中，隐私保护与识别准确率之间的权衡关系如何？

主要发现

通过语音转换，VoiceMask 将从 50 人中识别用户语音的概率降低了 84%。
系统将语音识别准确率的损失控制在原始输入的 14.2% 以内，确保了可用性。
基于进化的关键词替换技术能有效替换敏感术语，同时保持语义意义和上下文连贯性。
整个净化处理流程在设备端完成，彻底避免了原始语音数据暴露给云服务。
该系统在 Android 移动设备上具有高效性和实际部署可行性。
语音转换与内容净化的结合提供了对身份和内容隐私泄露的强防护能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。