QUICK REVIEW

[논문 리뷰] Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models

Manish Bhatt, Sahana Chennabasappa|arXiv (Cornell University)|2023. 12. 07.

Artificial Intelligence in Healthcare and Education인용 수 15

한 줄 요약

CyberSecEval은 8개 언어 전반에서 불안전한 코드 생성을 위한 LLM 평가 및 사이버공격 촉진 프롬프트에 대한 준수 여부를 평가하는 포괄적 벤치마크이며, Llama 2, Code Llama, OpenAI GPT 계열의 7개 모델에 대한 사례 연구를 포함합니다.

ABSTRACT

This paper presents CyberSecEval, a comprehensive benchmark developed to help bolster the cybersecurity of Large Language Models (LLMs) employed as coding assistants. As what we believe to be the most extensive unified cybersecurity safety benchmark to date, CyberSecEval provides a thorough evaluation of LLMs in two crucial security domains: their propensity to generate insecure code and their level of compliance when asked to assist in cyberattacks. Through a case study involving seven models from the Llama 2, Code Llama, and OpenAI GPT large language model families, CyberSecEval effectively pinpointed key cybersecurity risks. More importantly, it offered practical insights for refining these models. A significant observation from the study was the tendency of more advanced models to suggest insecure code, highlighting the critical need for integrating security considerations in the development of sophisticated LLMs. CyberSecEval, with its automated test case generation and evaluation pipeline covers a broad scope and equips LLM designers and researchers with a tool to broadly measure and enhance the cybersecurity safety properties of LLMs, contributing to the development of more secure AI systems.

연구 동기 및 목표

코딩 어시스턴트로 사용되는 LLM에서 사이버 보안 위험을 동기 부여하고 측정한다.
다양한 언어에 걸쳐 불안전한 코딩 관행을 감지하기 위한 자동화된 테스트 스위트를 개발한다.
사이버 공격을 돕도록 요청받았을 때 LLM의 준수 여부를 평가하고 안전상의 약점을 식별한다.

제안 방법

8개 언어에 걸친 50개 CWE를 다루는 189개의 정적 분석 규칙을 가진 Insecure Code Detector (ICD)를 개발한다.
자동완성 및 컨텍스트 지시를 위한 불안전한 코드로부터 테스트 프롬프트를 자동으로 생성한다.
프롬프트를 수작업으로 작성하고 이를 Llama-70b-chat으로 보강하여 악의적 유용성을 판단하는 사이버공격 도움 가능성 테스트를 만든다.
판정용 LLM(judge LLM)으로 LLM 출력물을 평가하여 불안전한 코드 및 사이버공격 도움 가능성을 탐지하고 정밀도/재현율을 계산한다.
Llama 2, Code Llama 및 OpenAI GPT 계열의 일곱 모델에 벤치마크를 적용한 사례 연구.
프로젝트 저장소에서 이용 가능한 오픈 소스 도구 및 테스트 케이스를 제공한다.

Figure 1: High level overview of CyberSecEval ’s approach.

실험 결과

연구 질문

RQ1LLM이 코드를 완성하거나 코드를 작성하도록 지시받았을 때 불안전한 코드를 생성하는가, 그리고 언어 및 모델 유형에 따라 얼마나 자주 발생하는가?
RQ2LLM은 사이버 공격을 돕겠다는 요청에 준수하는가, 그리고 더 높은 코딩 능력이 더 높은 준수와 상관관계가 있는가?
RQ3자동화된 정적 분석 기반 검출과 LLM 기반 판단이 LLM의 사이버 보안 안전 속성을 정확하게 측정할 수 있는가?

주요 결과

LLMs는 테스트 케이스 전반에서 약 30%의 시점에 불안전한 코딩 관행을 제안했다.
CodeLlama 모델은 더 높은 코딩 능력을 가진 경우 더 많은 불안전한 코드를 생성하고 사이버공격 프롬프트에 더 잘 준수하는 경향이 있었다.
모델 및 위협 범주에 걸쳐 사이버공격 준수는 평균 53%였다.
Insecure Code Detector는 LLM이 생성한 불안전한 코드 탐지에서 전체적으로 96%의 정밀도와 79%의 재현율을 달성했다.
사이버공격 도움 가능성 탐지는 사이버 공격자에게 유용한 응답을 식별하는 데 94%의 정밀도와 84%의 재현율을 달성했다.

Figure 2: The precision and recall of our Insecure Code Detector static analyzer at detecting insecure code in LLM completions.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.