QUICK REVIEW

[논문 리뷰] How to Prove Your Model Belongs to You: A Blind-Watermark based Framework to Protect Intellectual Property of DNN

Zheng Li, Chengyu Hu|arXiv (Cornell University)|2019. 03. 05.

Adversarial Robustness in Machine Learning참고 문헌 27인용 수 57

한 줄 요약

이 논문은 DNN에 구별할 수 없는 키 샘플을 삽입하여 소유권을 증명하고 회피를 방어하며 허위 주장에 저항하는 blind-watermark IPP 프레임워크를 제시하며, 데이터셋과 아키텍처 전반에서 강력한 실험 결과를 보여준다.

ABSTRACT

Deep learning techniques have made tremendous progress in a variety of challenging tasks, such as image recognition and machine translation, during the past decade. Training deep neural networks is computationally expensive and requires both human and intellectual resources. Therefore, it is necessary to protect the intellectual property of the model and externally verify the ownership of the model. However, previous studies either fail to defend against the evasion attack or have not explicitly dealt with fraudulent claims of ownership by adversaries. Furthermore, they can not establish a clear association between the model and the creator's identity. To fill these gaps, in this paper, we propose a novel intellectual property protection (IPP) framework based on blind-watermark for watermarking deep neural networks that meet the requirements of security and feasibility. Our framework accepts ordinary samples and the exclusive logo as inputs, outputting newly generated samples as watermarks, which are almost indistinguishable from the origin, and infuses these watermarks into DNN models by assigning specific labels, leaving the backdoor as the basis for our copyright claim. We evaluated our IPP framework on two benchmark datasets and 15 popular deep learning models. The results show that our framework successfully verifies the ownership of all the models without a noticeable impact on their primary task. Most importantly, we are the first to successfully design and implement a blind-watermark based framework, which can achieve state-of-art performances on undetectability against evasion attack and unforgeability against fraudulent claims of ownership. Further, our framework shows remarkable robustness and establishes a clear association between the model and the author's identity.

연구 동기 및 목표

DNN 지적 재산권 보호의 필요성과 기존 워터마킹 방법의 한계를 다룰 필요성 제기.
모델과 창작자의 신원을 연결하는 blind-watermark 기반 IPP 프레임워크를 제안.
여러 데이터셋과 아키텍처에 대한 프로토타입 및 실증 평가를 통해 타당성과 실용성 입증.
회피 공격 및 허위 소유권 주장에 대한 강건성 평가.
워터마크가 주요 모델 성능에 미치는 영향은 최소화하면서 소유권 검증의 신뢰성을 가능하게 하는지 보여줌.

제안 방법

watermark를 삽입하기 위해 x^key = G(e, x, l) 형태의 키 샘플을 생성하고, e는 경량 인코더이며 l은 독점 로고를 의미한다; 자동인코더와 판별기를 통한 훈련 설정으로 키-샘플 분포 P_e를 데이터 분포 P_data와 정렬시키고자 한다.
적대적/판별자 목표를 사용하여 P_data와 P_e 간의 KL 발산을 최소화하고, 샘플 구별 불가성을 보존하기 위해 SSIM 기반 재구성 손실을 도입한다.
호스트 DNN을 백도어로 설정하여 x^key가 미리 정의된 라벨 t^key로 매핑되도록 하여 키 샘플에서 높은 정확도로 소유권 검증이 가능하도록 한다.
소유주가 원격 모델에 키 샘플을 질의하고 acc_g(x^key, t^key)가 임계값을 초과하는지 확인하는 검증 절차를 제공한다.
재구성 충실도, SSIM, 적대적 손실을 결합한 공동 목표 O_e를 자세히 제시하여 인코더, 판별기, 호스트 모델의 학습을 안내한다.
인코더, 판별기, 호스트 DNN를 포함하는 overall pipeline을 개략하고 학습 프로토콜 및 하이퍼파라미터 설정을 제시한다.

실험 결과

연구 질문

RQ1블라인드-watermark IPP 프레임워크가 원래 작업에 대한 충실도를 유지하면서 DNN의 소유권을 신뢰성 있게 증명할 수 있는가?
RQ2제안된 프레임워크가 기존 워터마킹 방법보다 회피 공격 및 허위 소유권 주장에 더 잘 저항하는가?
RQ3워터마크 분포가 원래 데이터 분포에 근접하여 인간이 인식하지 못하고 강건성을 확보하는가?
RQ4실용적 시나리오에서 모델과 창작자의 신원 간의 명확한 연관성을 확립할 수 있는가?
RQ5다양한 아키텍처와 데이터셋에서 워터마크 삽입이 모델 정확도에 미치는 영향은 어떤가?

주요 결과

워터마크가 적용된 모델은 비워터마크 모델과 유사한 정확도를 유지하며, 충실도 감소는 평균 0.66%이고 최소 0.14%이다.
키 샘플은 높은 검증 정확도를 달성하며 워터마크된 모델은 키 샘플에서 90% 넘는 정확도에 도달하고 때로는 100%에 이른다.
블라인드-워터마크 접근법은 회피 공격에 대해 탐지 불가에 도달하여 탐지기가 무작위와 다르지 않은 성능(AUC 약 0.5~0.65)을 보인다(확장 테스트에서).
프레임워크는 합리적 가정하에서 유효한 키 샘플을 위조하기 어렵게 만들어 허위 소유권 주장에 대한 강건성을 보여준다.
MNIST와 CIFAR-10에서 15개의 호스트 DNN에 대한 실험에서 주요 작업에 대한 영향이 제한적인 상태에서 소유권 검증이 성공적으로 이루어졌다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.