QUICK REVIEW

[논문 리뷰] Improving ChatGPT Prompt for Code Generation

Chao Liu, Xuanlin Bao|arXiv (Cornell University)|2023. 05. 15.

Software Engineering Research인용 수 38

한 줄 요약

이 논문은 체인 오브 생각(chain-of-thought)과 다단계 최적화를 통해 신중하게 설계된 프롬프트가 CodeXGlue에서 ChatGPT의 텍스트-대-코드(text-to-code) 및 코드-대-코드(code-to-code) 생성 성능을 크게 향상시킨다는 것을 보여주고, 간결성, 세션 맥락, 무작위성 등의 요소를 분석한다.

ABSTRACT

Automated code generation can be a powerful technique for software development, significantly reducing developers' efforts and time required to create new code by generating it automatically based on requirements. Recently, OpenAI's language model ChatGPT has emerged as a powerful tool for generating human-like responses to a wide range of textual inputs (i.e., prompts), including those related to code generation. However, the effectiveness of ChatGPT for code generation is not well understood, and the generation performance could be heavily influenced by the choice of prompt. To answer these questions, we conducted experiments using the CodeXGlue dataset to evaluate ChatGPT's capabilities for two code generation tasks, including text-to-code and code-to-code generation. We designed prompts by leveraging the chain-of-thought strategy with multi-step optimizations. Our results showed that by carefully designing prompts to guide ChatGPT, the generation performance can be improved substantially. We also analyzed the factors that influenced the prompt design and provided insights that could guide future research.

연구 동기 및 목표

CodeXGlue를 사용하여 두 가지 작업(T2C 및 C2C)에서 ChatGPT의 코드 생성 성능을 평가한다.
체인 오브 생각(chain-of-thought) 및 다단계 최적화를 통해 ChatGPT를 안내하는 프롬프트 공학 기법을 조사한다.
프롬프트 설계에 영향을 주는 요인을 식별하고 향후 연구를 위한 실행 가능한 통찰을 제공한다.

제안 방법

CodeXGlue를 사용하여 T2C 및 C2C 작업에서 ChatGPT를 평가한다.
수동 체인 오브 생각 기반의 다단계 최적화(P1–P5) 프롬프트를 설계한다.
BLEU 및 CodeBLEU 지표로 프롬프트를 평가하고 개선을 분석한다.
베이스라인 테스트: ChatGPT-task, ChatGPT-detail, ChatGPT-behaviour를 적용한다.
간결성, 세션, 생성 무작위성의 영향력을 탐색한다.
CodeXGlue에서 미세조정된 LLM과 ChatGPT를 비교한다.

실험 결과

연구 질문

RQ1RQ1: 설계된 프롬프트가 T2C 및 C2C 작업에서 ChatGPT에 얼마나 효과적인가?
RQ2RQ2: 간결성 요청이 ChatGPT가 생성한 코드에 어떤 영향을 미치는가?
RQ3RQ3: 연속 세션 대 개별 세션 설정이 ChatGPT 출력에 어떤 영향을 미치는가?
RQ4RQ4: 생성 무작위성이 ChatGPT 코드 품질과 일관성에 어떤 영향을 미치는가?

주요 결과

작업	모델	BLEU	CodeBLEU
T2C	ChatGPT-task	5.63	28.05
T2C	ChatGPT-detail	14.09	39.90
T2C	ChatGPT-behaviour	21.59	48.69
C2C	ChatGPT-task	10.61	46.12
C2C	ChatGPT-detail	15.79	47.71
C2C	ChatGPT-behaviour	9.47	47.38

프롬프트 설계가 T2C 및 C2C 성능을 크게 개선한다(예: 베이스라인 대비 BLEU 및 CodeBLEU 향상).
간결성 요청은 T2C 결과를 향상시키지만(C2C에 대해서는 혼합 효과를 보임).
연속 세션은 C2C에 도움이 되지만 T2C에는 그렇지 않으며, 이 설정에서 개별 세션이 T2C에 더 나은 성능을 나타낸다.
생성 무작위성은 설계된 프롬프트 하에서 큰 영향을 주지 않으며 실행 간 안정적인 결과를 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.