QUICK REVIEW

[논문 리뷰] DeepSigns: A Generic Watermarking Framework for IP Protection of Deep Learning Models

Bita Darvish Rouhani, Huili Chen|arXiv (Cornell University)|2018. 04. 02.

Adversarial Robustness in Machine Learning인용 수 84

한 줄 요약

DeepSigns는 활성화의 분포를 층 간에 수정하고 출력층의 후처리 단계로 강력한 디지털 워터마크를 딥러닝 모델에 삽입하여 화이트박스 설정과 블랙박스 설정 모두에서 IP 소유권 증명을 가능하게 하며 모델 정확도를 보존하고 일반적인 공격에 저항합니다.

ABSTRACT

Deep Learning (DL) models have caused a paradigm shift in our ability to comprehend raw data in various important fields, ranging from intelligence warfare and healthcare to autonomous transportation and automated manufacturing. A practical concern, in the rush to adopt DL models as a service, is protecting the models against Intellectual Property (IP) infringement. The DL models are commonly built by allocating significant computational resources that process vast amounts of proprietary training data. The resulting models are therefore considered to be the IP of the model builder and need to be protected to preserve the owner's competitive advantage. This paper proposes DeepSigns, a novel end-to-end IP protection framework that enables insertion of coherent digital watermarks in contemporary DL models. DeepSigns, for the first time, introduces a generic watermarking methodology that can be used for protecting DL owner's IP rights in both white-box and black-box settings, where the adversary may or may not have the knowledge of the model internals. The suggested methodology is based on embedding the owner's signature (watermark) in the probability density function (pdf) of the data abstraction obtained in different layers of a DL model. DeepSigns can demonstrably withstand various removal and transformation attacks, including model compression, model fine-tuning, and watermark overwriting. Proof-of-concept evaluations on MNIST, and CIFAR10 datasets, as well as a wide variety of neural network architectures including Wide Residual Networks, Convolution Neural Networks, and Multi-Layer Perceptrons corroborate DeepSigns' effectiveness and applicability.

연구 동기 및 목표

서비스로의 배치가 증가하는 가운데 딥러닝 모델의 IP 보호를 촉진하기 위해.
화이트박스와 블랙박스 설정 모두에서 작동하는 일반적인 워터마킹 프레임워크를 제안한다.
은닉 계층의 활성화 분포와 학습 후 출력층에 워터마크를 삽입하되 정확도를 해치지 않는다.
모델 압축, 파인튜닝 및 워터마크 덮어쓰기에도 견고함을 입증한다.
다양한 아키텍처에서 채택을 촉진하기 위한 실용적 지표와 API를 제공한다.

제안 방법

레이어별 활성화 분포의 평균에 N비트 워터마크 문자열을 삽입한다(가우시안 혼합 모델 prior).
선택된 활성화 평균이 선택된 가우시안 중심에 정렬되도록 하는 항(loss1)을 학습 손실에 추가하고; 활성화 특징을 SGD 최적화를 통해 워터마크 비트의 이진화된 프로젝션으로 밀어넣는 두 번째 항(loss2)을 포함한다.
선택된 가우시안 중심을 워터마크 비트(b)로 매핑하기 위해 임의 투영 행렬 A와 시그모이드에 이어 하드 임계화(hard thresholding)를 사용한다.
훈련 중에 loss0(분류), loss1(GMM 정렬), loss2(워터마크 비트 정렬)를 함께 최적화하여 정확도를 해치지 않으면서 워터마크를 삽입한다.
출력층을 후처리 단계로 워터마크화하고, 클래스 조건부 분포의 꼬리 영역을 활용하여 K개의 입력 키를 생성하고, 해당 키로 미세 조정하여 키 샘플의 올바른 태깅을 강제한다.

실험 결과

연구 질문

RQ1일반적인 워터마킹 프레임워크가 화이트박스와 블랙박스 배치 모두에서 DL 모델 소유권을 보호할 수 있는가?
RQ2MLP, CNN, ResNet, WideResNet 등 다양한 아키텍처에서 기반 모델의 정확도를 떨어뜨리지 않고 강력한 워터마크를 삽입하는 것이 가능한가?
RQ3프루닝, 파인튜닝, overwriting과 같은 일반적인 DL 모델 변환에 워터마크가 얼마나 강인한가?
RQ4워터마크 추출이 설정 간에도 낮은 위양성률과 합리적인 탐지 임계값으로 소유권을 신뢰성 있게 검증할 수 있는가?
RQ5실제 DL 실무에서 채택을 가능하게 하려면 어떤 실용적인 지표와 API 지원이 필요한가?

주요 결과

DeepSigns는 평가된 모델에서 예측 정확도를 떨어뜨리지 않으면서 중간 활성화의 확률밀도 함수(pdf)와 출력층에 이진 정보를 삽입하여 워터마크를 삽입할 수 있다.
이 프레임워크는 MNIST, CIFAR-10 및 다수의 아키텍처(MLP, CNN, WideResNet)에 걸친 광범위한 실험에서 프루닝, 파인튜닝 및 워터마크 덮어쓰기에 대한 견고함을 보여준다.
은닉층에서의 기능적 워터마킹(가우시안 중심 및 projection)을 통한 이중 워터마킹 접근 방식과 사후 학습 데이터-키 기반 트리거링을 통한 출력층 워터마킹을 제공한다.
이 방법은 신중하게 선택된 키와 임계값을 통해 위양성(오탐)을 조절하는 메커니즘으로 높은 탐지 능력을 제공하며 화이트박스와 블랙박스 모두에 적용 가능하다.
향후 DL 워터마킹 방법과의 채택 및 비교를 용이하게 하기 위한 API와 평가 지표 세트를 제안한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.