QUICK REVIEW

[논문 리뷰] AI Safety in Generative AI Large Language Models: A Survey

Jaymari Chua, Yun Li|arXiv (Cornell University)|2024. 07. 06.

Explainable Artificial Intelligence (XAI)인용 수 8

한 줄 요약

컴퓨터과학 중심의 설문조사로 구성요소 기반 프레임워크를 사용해 생성형 AI GAI-LLMs의 AI 안전 위험을 분류하고 이를 모델 학습, 프롬프트, 정렬 방법과 연결한다.

ABSTRACT

Large Language Model (LLMs) such as ChatGPT that exhibit generative AI capabilities are facing accelerated adoption and innovation. The increased presence of Generative AI (GAI) inevitably raises concerns about the risks and safety associated with these models. This article provides an up-to-date survey of recent trends in AI safety research of GAI-LLMs from a computer scientist's perspective: specific and technical. In this survey, we explore the background and motivation for the identified harms and risks in the context of LLMs being generative language models; our survey differentiates by emphasising the need for unified theories of the distinct safety challenges in the research development and applications of LLMs. We start our discussion with a concise introduction to the workings of LLMs, supported by relevant literature. Then we discuss earlier research that has pointed out the fundamental constraints of generative models, or lack of understanding thereof (e.g., performance and safety trade-offs as LLMs scale in number of parameters). We provide a sufficient coverage of LLM alignment -- delving into various approaches, contending methods and present challenges associated with aligning LLMs with human preferences. By highlighting the gaps in the literature and possible implementation oversights, our aim is to create a comprehensive analysis that provides insights for addressing AI safety in LLMs and encourages the development of aligned and secure models. We conclude our survey by discussing future directions of LLMs for AI safety, offering insights into ongoing research in this critical area.

연구 동기 및 목표

데이터, 모델, 프롬프트, 정렬, 확장 관점에서 GAI-LLMs의 안전 피해와 위험에 대한 구조화된 개요를 제공한다.
LLM 아키텍처와 워크플로에 안전 문제를 매핑하는 구성요소 기반 분류체계를 개발한다.
확인된 위험을 인컨텍스트 학습, 프롬프트, 사람-피드백을 통한 강화학습 등 핵심 LLM 방법론과 상관관계를 분석한다.
GAI-LLMs의 향후 안전한 개발을 안내하기 위한 평가 프레임워크와 격차를 식별한다.
대규모 언어 모델에서의 AI 안전 연구와 실천의 향후 방향을 논의한다.

제안 방법

컴퓨터 과학/NLP 관점에서 GAI-LLMs의 AI 안전에 대한 문헌 중심 조사를 수행한다.
데이터 안전, 모델 안전, 프롬프트 안전, 정렬, 확장에서의 안전의 다섯 범주 분류체계(Data Safety, Model Safety, Prompt Safety, Alignment, Safety at Scale)를 제안한다.
안전 위험을 LLM 방법론(in-context learning, prompting, reinforcement learning)과 상관관계지어 분석한다.
안전 평가에 사용된 평가 프레임워크와 거버넌스 자료를 검토하고 참조한다(예: HELM, BigBench).
다른 연구들과의 비교를 통해 구성요소 기반의 기원 중심의 안전 이슈 관점을 강조한다.

실험 결과

연구 질문

RQ1생성형 AI 대형 언어 모델과 관련된 주요 안전 위험은 무엇인가?
RQ2데이터, 모델, 프롬프트, 정렬, 확장에 걸쳐 LLM의 안전 우려를 어떻게 체계적으로 분류할 수 있는가?
RQ3확인된 위험이 in-context learning, prompting, 인간 피드백을 통한 강화학습과 같은 특정 LLM 방법론에 어떻게 매핑되는가?
RQ4LLM 안전 평가에 존재하는 평가 프레임워크는 무엇이며 미래 연구의 격차는 어디에 있는가?
RQ5정렬되고 안전한 GAI-LLMs를 개선하기 위한 향후 방향과 개입은 무엇이 제안되는가?

주요 결과

데이터 안전, 모델 안전, 프롬프트 안전, 정렬, 확장에서의 안전에 대한 LLM의 새로운 구성요소 기반 분류체계를 제안한다.
안전 위험이 특정 LLM 기법(특히 in-context learning, prompting, 강화학습)과 어떻게 연결되는지 보여주어 표적 개입을 가능하게 한다.
지속적인 안전 평가의 일부로 평가 프레임워크(예: HELM, BigBench) 및 거버넌스 문헌의 범위와 관련성을 요약한다.
안전 도전 과제에 대한 통일 이론의 필요성을 강조하고 문헌과 실제 시스템의 구현 간의 격차를 강조한다.
모델이 확장될수록 정렬되고 안전한 LLM 개발을 유도하기 위한 체계적 종합을 제공한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.