QUICK REVIEW

[논문 리뷰] Mind Your Weight(s): A Large-scale Study on Insufficient Machine Learning Model Protection in Mobile Apps

Zhichuang Sun, Ruimin Sun|arXiv (Cornell University)|2020. 02. 18.

Advanced Malware Detection Techniques참고 문헌 27인용 수 26

한 줄 요약

이 대규모 연구는 미국 및 중국 앱 시장에서 수집한 46,753개의 안드로이드 앱에서 기계학습 모델 보호 기법을 분석한 결과, ML 기반 앱의 41%는 모델을 평문으로 저장하고 있으며, 암호화된 모델의 66%는 간단한 동적 분석을 통해 추출될 수 있음을 확인했다. 연구 결과, 모델 도용에 대한 광범위한 취약성이 드러나며, 이는 재정적 및 보안적 결과를 초래할 수 있으며, 이에 따라 현장에서의 모델 보호 기법이 강력히 필요하다고 제언한다.

ABSTRACT

On-device machine learning (ML) is quickly gaining popularity among mobile apps. It allows offline model inference while preserving user privacy. However, ML models, considered as core intellectual properties of model owners, are now stored on billions of untrusted devices and subject to potential thefts. Leaked models can cause both severe financial loss and security consequences. This paper presents the first empirical study of ML model protection on mobile devices. Our study aims to answer three open questions with quantitative evidence: How widely is model protection used in apps? How robust are existing model protection techniques? What impacts can (stolen) models incur? To that end, we built a simple app analysis pipeline and analyzed 46,753 popular apps collected from the US and Chinese app markets. We identified 1,468 ML apps spanning all popular app categories. We found that, alarmingly, 41% of ML apps do not protect their models at all, which can be trivially stolen from app packages. Even for those apps that use model protection or encryption, we were able to extract the models from 66% of them via unsophisticated dynamic analysis techniques. The extracted models are mostly commercial products and used for face recognition, liveness detection, ID/bank card recognition, and malware detection. We quantitatively estimated the potential financial and security impact of a leaked model, which can amount to millions of dollars for different stakeholders. Our study reveals that on-device models are currently at high risk of being leaked; attackers are highly motivated to steal such models. Drawn from our large-scale study, we report our insights into this emerging security problem and discuss the technical challenges, hoping to inspire future research on robust and practical model protection for mobile devices.

연구 동기 및 목표

미국 및 중국 시장의 주요 모바일 앱에서 기계학습 모델 보호 기법의 보편성을 조사한다.
비숙련된 동적 분석 공격에 대한 기존 모델 보호 기법의 강건성을 평가한다.
모델 범죄로 인한 제조사 및 공격자에게 미치는 재정적 및 보안적 영향을 정량화한다.
모바일 플랫폼용 표준화되고 실용적이며 강건한 모델 보호 메커니즘의 긴급한 필요성을 강조한다.

제안 방법

안드로이드 앱 패키지 내 기계학습 프레임워크 및 모델 사용 여부를 탐지하기 위한 자동화된 정적 분석 파이프라인을 구축했다.
미국 및 중국 앱 마켓에서 수집한 46,753개의 유명 앱 데이터셋에서 1,468개의 ML 기반 앱을 식별했다.
메모리 인스트루멘테이션을 활용한 동적 분석을 통해 실행 중인 앱으로부터 복호화된 모델을 추출했다.
공유되는 모델 파일과 그 배포 패턴을 식별하여 여러 앱 간 모델 재사용 여부를 추적했다.
역공학 및 런타임 메모리 검사를 적용하여 암호화된 모델조차도 추출했다.
연구 개발 투자 비용, 시장 경쟁력, 악성 공격 위험 등을 바탕으로 재정적 및 보안적 영향 분석을 수행했다.

실험 결과

연구 질문

RQ1기기 내 기계학습을 수행하는 모바일 앱에서 모델 보호 기법은 얼마나 널리 사용되고 있는가?
RQ2기존의 모델 보호 기법은 동적 메모리 추출 공격에 대해 얼마나 강건한가?
RQ3모델 泄露로 인한 공격자 및 모델 제조사에게 미치는 재정적 및 보안적 영향은 무엇인가?

주요 결과

분석 대상인 1,468개의 ML 기반 앱 중 41%는 모델을 전혀 보호하지 않으며, 앱 패키지 내 평문으로 저장하고 있다.
암호화를 사용하는 앱 중에서도 66%의 모델은 기본적인 동적 분석 기법을 통해 런타임 메모리에서 추출될 수 있었다.
총 18개의 고유한 모델이 추출되었으며, 이는 347개의 별도 앱에서 공유되고 있음을 확인하여 보호된 모델의 광범위한 재사용을 시사한다.
다중 암호화 레이어나 오브스큐레이션 기법으로 보호된 모델조차도 메모리에서 평문으로 성공적으로 추출되었다.
모델 泄露로 인한 재정적 영향은 연구 개발 투자 손실과 경쟁 우위 상실로 인해 수백만 달러에 이를 수 있다.
도난당한 모델은 얼굴 인식 또는 생체 인증 방지 기능을 우회하는 악성 공격을 가능하게 하여 심각한 보안 위협을 초래한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.