QUICK REVIEW

[论文解读] Mind Your Weight(s): A Large-scale Study on Insufficient Machine Learning Model Protection in Mobile Apps

Zhichuang Sun, Ruimin Sun|arXiv (Cornell University)|Feb 18, 2020

Advanced Malware Detection Techniques参考文献 27被引用 26

一句话总结

这项大规模研究分析了46,753款Android应用中的机器学习模型保护机制，发现41%的机器学习应用将模型以明文形式存储，而66%的加密模型可通过简单的动态分析被提取。研究揭示了模型被盗的普遍性及其带来的严重财务和安全后果，并呼吁采用强大且可靠的设备端模型保护技术。

ABSTRACT

On-device machine learning (ML) is quickly gaining popularity among mobile apps. It allows offline model inference while preserving user privacy. However, ML models, considered as core intellectual properties of model owners, are now stored on billions of untrusted devices and subject to potential thefts. Leaked models can cause both severe financial loss and security consequences. This paper presents the first empirical study of ML model protection on mobile devices. Our study aims to answer three open questions with quantitative evidence: How widely is model protection used in apps? How robust are existing model protection techniques? What impacts can (stolen) models incur? To that end, we built a simple app analysis pipeline and analyzed 46,753 popular apps collected from the US and Chinese app markets. We identified 1,468 ML apps spanning all popular app categories. We found that, alarmingly, 41% of ML apps do not protect their models at all, which can be trivially stolen from app packages. Even for those apps that use model protection or encryption, we were able to extract the models from 66% of them via unsophisticated dynamic analysis techniques. The extracted models are mostly commercial products and used for face recognition, liveness detection, ID/bank card recognition, and malware detection. We quantitatively estimated the potential financial and security impact of a leaked model, which can amount to millions of dollars for different stakeholders. Our study reveals that on-device models are currently at high risk of being leaked; attackers are highly motivated to steal such models. Drawn from our large-scale study, we report our insights into this emerging security problem and discuss the technical challenges, hoping to inspire future research on robust and practical model protection for mobile devices.

研究动机与目标

调查在美国和中国市场流行的移动应用中，执行设备端机器学习的应用中模型保护的普及程度。
评估现有模型保护技术对非高级动态分析攻击的鲁棒性。
量化模型泄露对厂商和攻击者在财务和安全方面的影响。
强调为移动平台制定标准化、实用且可靠的模型保护机制的紧迫需求。

提出的方法

构建了自动化静态分析流水线，用于检测Android应用包中的机器学习框架和模型使用情况。
从在美国和中国应用市场收集的46,753款流行应用数据集中，识别出1,468款机器学习应用。
采用带内存仪器化的动态分析，从运行中的应用中提取解密后的模型。
通过识别共享的模型文件及其部署模式，追踪模型在多个应用中的复用情况。
应用逆向工程和运行时内存检查技术，即使在模型加密的情况下也能提取模型。
基于研发成本、市场竞争和对抗性规避风险，开展财务和安全影响分析。

实验结果

研究问题

RQ1在执行设备端机器学习的移动应用中，模型保护的使用范围如何？
RQ2现有模型保护机制对动态内存提取攻击的鲁棒性如何？
RQ3模型泄露对攻击者和模型厂商在财务和安全方面的影响是什么？

主要发现

在分析的1,468款机器学习应用中，41%完全未对模型进行保护，将模型以明文形式存储在应用包中。
在使用加密的应用中，66%的模型可使用基础的动态分析技术从运行时内存中提取。
共提取出18个独立模型，并发现其在347个不同应用中被复用，表明受保护模型的广泛复用。
即使采用多层加密或混淆技术保护的模型，仍能成功从内存中以明文形式提取。
由于研发投资和竞争优势的损失，模型泄露的财务影响可达数百万美元。
被盗模型可被用于实施对抗性攻击，如绕过人脸识别或活体检测，带来严重安全风险。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。