QUICK REVIEW

[논문 리뷰] Unique Security and Privacy Threats of Large Language Models: A Comprehensive Survey

Shang Wang, Tianqing Zhu|arXiv (Cornell University)|2024. 06. 12.

Privacy-Preserving Technologies in Data인용 수 7

한 줄 요약

이 설문은 pre-training, fine-tuning, RAG 시스템, 배포, 및 LLM 기반 에이전트에 걸친 대형 언어 모델(LLMs)의 고유한 프라이버시 및 보안 위협에 대한 다섯 가지 시나리오 분류법을 제공하며, 위협과 대응책을 개략적으로 제시한다.

ABSTRACT

With the rapid development of artificial intelligence, large language models (LLMs) have made remarkable advancements in natural language processing. These models are trained on vast datasets to exhibit powerful language understanding and generation capabilities across various applications, including chatbots, and agents. However, LLMs have revealed a variety of privacy and security issues throughout their life cycle, drawing significant academic and industrial attention. Moreover, the risks faced by LLMs differ significantly from those encountered by traditional language models. Given that current surveys lack a clear taxonomy of unique threat models across diverse scenarios, we emphasize the unique privacy and security threats associated with four specific scenarios: pre-training, fine-tuning, deployment, and LLM-based agents. Addressing the characteristics of each risk, this survey outlines and analyzes potential countermeasures. Research on attack and defense situations can offer feasible research directions, enabling more areas to benefit from LLMs.

연구 동기 및 목표

전통적인 모델과 비교했을 때 LLM에 고유한 프라이버시 및 보안 위험을 동기 부여하고 분석한다.
다섯 가지 생애 주기 시나리오에 맞춘 세부적인 위협 모델의 분류를 제안한다.
LLM 특유의 위험과 언어 모델 전반의 일반적 위험을 식별하고 논의한다.
기존 대응책을 요약하고 향후 방어 연구의 방향을 제시한다.

제안 방법

LLM 생애 주기를 다섯 가지 위협 시나리오로 구성한다: pre-training, fine-tuning, retrieval-augmented generation (RAG), deployment, 및 LLM 기반 에이전트.
각 시나리오 내 위협의 분류를 제공하고, 공격 목표, 능력, 방법을 자세히 설명한다.
각 위험을 대응하는 대응책 및 방어책에 매핑한다.
프라이버시 및 보안 위험을 federated learning, machine unlearning, watermarking의 세 가지 추가 시나리오와 함께 검토한다.
고유 LLM 위험과 언어 모델 전반의 일반적 위험을 대조하고 방어 전략을 논의한다.

실험 결과

연구 질문

RQ1What unique privacy risks arise for LLMs during pre-training, fine-tuning, RAG, deployment, and agent deployment?
RQ2What are the corresponding security threats and their attacker models in each life-cycle stage?
RQ3What countermeasures exist or are feasible to mitigate these LLM-specific risks across the five scenarios, plus federated learning, unlearning, and watermarking?
RQ4How do these risks differ from traditional language models, and what research directions are suggested to advance robust defenses?

주요 결과

LLMs introduce unique privacy risks such as memorization of training data and white-box data extraction when exposed to open interfaces.
LLMs face unique security risks including backdoors, poisoning, and jailbreaks tied to instruction-tuning and alignment processes.
RAG systems introduce risks via poisoned knowledge bases and jailbreak prompts targeting knowledge owners’ privacy.
Deployment of LLMs entails prompt-based attacks and prompt-stealing risks, with both unique and common model-level vulnerabilities.
LLM-based agents introduce autonomous risk due to their interactions and potential backdoors, requiring defense through guardrails and secure agent design.
The survey proposes taxonomy-driven mappings between risks and countermeasures across scenarios, highlighting research directions for safer LLM use.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.