QUICK REVIEW

[논문 리뷰] From Copilot to Pilot: Towards AI Supported Software Development

Rohith Pudari, Neil Ernst|arXiv (Cornell University)|2023. 03. 07.

Software Engineering Research인용 수 12

한 줄 요약

이 논문은 GitHub Copilot이 Pythonic 관용구를 따르고 JavaScript의 모범 사례를 준수하는 능력을 평가하고, AI 지원 코드 완성의 한계를 정의하기 위한 여섯 단계 소프트웨어 추상화 계층 분류를 제안한다.

ABSTRACT

AI-supported programming has arrived, as shown by the introduction and successes of large language models for code, such as Copilot/Codex (Github/OpenAI) and AlphaCode (DeepMind). Above human average performance on programming challenges is now possible. However, software engineering is much more than solving programming contests. Moving beyond code completion to AI-supported software engineering will require an AI system that can, among other things, understand how to avoid code smells, to follow language idioms, and eventually (maybe!) propose rational software designs. In this study, we explore the current limitations of AI-supported code completion tools like Copilot and offer a simple taxonomy for understanding the classification of AI-supported code completion tools in this space. We first perform an exploratory study on Copilot's code suggestions for language idioms and code smells. Copilot does not follow language idioms and avoid code smells in most of our test scenarios. We then conduct additional investigation to determine the current boundaries of AI-supported code completion tools like Copilot by introducing a taxonomy of software abstraction hierarchies where 'basic programming functionality' such as code compilation and syntax checking is at the least abstract level, software architecture analysis and design are at the most abstract level. We conclude by providing a discussion on challenges for future development of AI-supported code completion tools to reach the design level of abstraction in our taxonomy.

연구 동기 및 목표

Copilot를 대표 도구로 사용하여 AI 지원 코드 완성 도구의 현재 경계를 평가한다.
Copilot의 파이썬식 관용구를 따르고 JavaScript 코드의 코드 스멜을 피하는 능력을 평가한다.
AI 지원 코드 생성 작업을 분류하기 위한 소프트웨어 추상화 계층의 분류 체계를 제안한다.
보다 높은 수준의 설계 지향 AI 소프트웨어 개발을 향한 도전 과제와 미래 방향을 논의한다.

제안 방법

오픈 소스 소스에서 25개의 유명한 Pythonic 관용구를 샘플링하고 AirBNB 가이드의 25개 JavaScript 모범 사례를 샘플링한다.
사례 시나리오 제목을 주석으로, 최소한의 코드로 두 부분 입력을 사용하여 Copilot 제안을 트리거한다.
언급된 관용구와 실천에 대해 Copilot의 상위 10개 제안을 평가하여 각 시나리오에 대한 Pass/Fail을 결정한다.
일반 코딩 표준과의 정렬을 평가하기 위해 관용구/실천에 대한 Copilot의 성능을 비교한다.
AI 지원 코드 완성의 경계를 매핑하기 위한 여섯 단계 소프트웨어 추상화 분류를 개발한다.
추상화 수준 간의 능력을 설명하기 위한 정렬 예제를 사용하여 추상화 수준 간의 기능을 설명한다.

Figure 3: Koopman’s Autonomous Vehicle Safety Hierarchy of Needs [ 26 ] . SOTIF = safety of the intended function.

실험 결과

연구 질문

RQ1RQ-1: AI 지원 코드 완성 도구의 현재 경계는 무엇인가?
RQ2RQ-1.1: AI 지원 코드 완성 도구는 프로그래밍 관용구를 어떻게 다루는가?
RQ3RQ-1.2: AI 지원 코드 완성 도구는 모범 사례를 통해 코드 냄새가 없는 코드를 어떻게 관리하는가?

주요 결과

Copilot은 25개 관용구 중 2개에 대해 첫 제안으로 Pythonic 관용구를 일치시켰고 남은 23개 관용구 중 상위 10위 안에 8개에 나타났다.
Copilot은 25개 관용구 중 15개에서는 상위 10개 제안에 파이썬식 관용적 접근법이 없었다.
자바스크립트 모범 사례의 경우 Copilot은 25건 중 가이드라인과 일치하는 최상위 제안을 3건 제공했고, 남은 22건 중 상위 10위 안에 5건에 들어갔다.
자바스크립트 시나리오 25건 중 17건에서 상위 10개 모범 사례를 제시하지 못했다.
Copilot은 일반적인 초급 작업(예: 숫자의 합, 모듈 가져오기)에서 더 나은 성능을 보였지만 전반적으로 일관되게 파이썬식 관용구나 모범 사례에 맞춘 코드를 생산하는 데 어려움을 겪었다.
저자들은 AI 지원 코드 생성 능력을 한정하기 위해 여섯 수준의 분류체계(Syntax, Correctness, Paradigms/Idioms, Code Smells, Design)를 제안하며, Copilot은 하위 수준에서 잘 작동하지만 상위의 아키텍처/설계 수준에서 도전에 직면한다.

Figure 4: Hierarchy of software abstractions. Copilot cleared all green levels and struggled in red levels.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.