QUICK REVIEW

[논문 리뷰] Prompt Injection attack against LLM-integrated Applications

Yi Liu, Gelei Deng|arXiv (Cornell University)|2023. 06. 08.

Topic Modeling인용 수 75

한 줄 요약

논문은 실세계 LLM 통합 앱에서 프롬프트 주입 위험을 분석하고, HouYi라는 블랙박스 공격 프레임워크를 소개하며, 36개 서비스에 걸쳐 이를 검증하고 86.1%의 성공률을 달성함으로써 프롬프트 도난 및 LLM 계산의 남용 가능성을 강조합니다.

ABSTRACT

Large Language Models (LLMs), renowned for their superior proficiency in language comprehension and generation, stimulate a vibrant ecosystem of applications around them. However, their extensive assimilation into various services introduces significant security risks. This study deconstructs the complexities and implications of prompt injection attacks on actual LLM-integrated applications. Initially, we conduct an exploratory analysis on ten commercial applications, highlighting the constraints of current attack strategies in practice. Prompted by these limitations, we subsequently formulate HouYi, a novel black-box prompt injection attack technique, which draws inspiration from traditional web injection attacks. HouYi is compartmentalized into three crucial elements: a seamlessly-incorporated pre-constructed prompt, an injection prompt inducing context partition, and a malicious payload designed to fulfill the attack objectives. Leveraging HouYi, we unveil previously unknown and severe attack outcomes, such as unrestricted arbitrary LLM usage and uncomplicated application prompt theft. We deploy HouYi on 36 actual LLM-integrated applications and discern 31 applications susceptible to prompt injection. 10 vendors have validated our discoveries, including Notion, which has the potential to impact millions of users. Our investigation illuminates both the possible risks of prompt injection attacks and the possible tactics for mitigation.

연구 동기 및 목표

현실 세계의 LLM이 통합된 애플리케이션에서 기존 프롬프트 주입 기술의 실용적 제약을 이해한다.
SQL 주입과 XSS 공격에서 영감받은 블랙박스 프롬프트 주입 방법론(HouYi)을 개발한다.
실서비스를 대상으로 공격의 타당성을 입증하고 위험을 정량화한다(잠재적인 재정적 영향 포함).
LLM이 통합된 애플리케이션의 강인한 설계와 방어를 위한 통찰을 제시하고 방어를 촉진한다

제안 방법

실세계의 LLM-통합 애플리케이션 10개를 조사하고 파일럿 연구를 수행하여 기존 프롬프트 주입 기술의 기초 효율성을 평가한다.
해로운 프롬프트를 합법적 프롬프트를 흉내 내며 주입하는 세 구성 요소(payload): Framework Component, Separator Component, Disruptor Component로 이루어진 HouYi를 개발한다.
Context Inference, Payload Generation, Feedback의 3단계 HouYi 워크플로우를 사용하여 반복적으로 개선한다.
36개의 LLM-통합 애플리케이션에서 HouYi를 평가하여 성공률을 측정하고 실패 사례를 분석한다.
오픈 소스 프로젝트의 방어책과 HouYi 생성 payload에 대한 한계를 비교 평가한다.

실험 결과

연구 질문

RQ1RQ1: 실세계 LLM-통합 애플리케이션에서 현존 프롬프트 주입 공격의 패턴과 한계는 무엇인가?
RQ2RQ2: 블랙박스 프롬프트 주입 기법에 노출될 때 현재 시스템은 얼마나 취약한가?
RQ3RQ3: 다양한 애플리케이션 범주에서 제안된 HouYi 프레임워크의 효과는 어떤가?
RQ4RQ4: 프롬프트 주입 위험을 완화할 수 있는 방어책이나 설계 원칙은 무엇인가?

주요 결과

기존 프롬프트 주입 방법은 10개의 상용 앱에 대한 파일럿에서 프롬프트 사용 방식의 다양성과 방어적 형식화로 인해 제한된 성공을 보였다.
HouYi는 36개 테스트 LLM-통합 애플리케이션에서 86.1%의 성공률을 달성했으며, 프롬프트 도난 및 LLM 컴퓨트의 무단 사용 가능성을 포함한다.
공격은 악의적 페이로드를 질문으로 다루도록 LLM을 유도하고 맥락 분리를 악용할 수 있지만, 포맷 규칙이나 다단계 프로세스를 이용한 방어가 효과를 제한할 수 있다.
Notion을 포함한 벤더가 연구 결과를 확인했고, 이는 수백만 사용자에게 잠재적 영향과 수백만 달러의 재정적 손실 가능성을 시사한다.
오픈 소스 프로젝트의 방어책은 일부 공격을 완화하지만 HouYi가 생성한 페이로드에는 취약하며, 보다 강력하고 견고한 방어의 필요성을 강조한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.