QUICK REVIEW

[논문 리뷰] The Impact of Large Language Models on Open-source Innovation: Evidence from GitHub Copilot

Doron Yeverechyahu, Raveesh Mayya|arXiv (Cornell University)|2024. 09. 12.

Open Source Software Innovations인용 수 7

한 줄 요약

이 논문은 GitHub Copilot의 2021년 출시를 둘러싼 자연 실험을 활용하여 Copilot 이용 가능성이 오픈 소스 기여를 28-40% 증가시키고, 증가 기여가 실질적 기여보다 증가율이 더 높으며, 맥락과 모델 업그레이드가 코드 협업에서 exploitation과 exploration의 균형에 어떤 영향을 미치는지 분석한다.

ABSTRACT

Large Language Models (LLMs) have been shown to enhance individual productivity in guided settings. Whereas LLMs are likely to also transform innovation processes in a collaborative work setting, it is unclear what trajectory this transformation will follow. Innovation in these contexts encompasses both capability innovation that explores new possibilities by acquiring new competencies in a project and iterative innovation that exploits existing foundations by enhancing established competencies and improving project quality. Whether LLMs affect these two aspects of collaborative work and to what extent is an open empirical question. Open-source development provides an ideal setting to examine LLM impacts on these innovation types, as its voluntary and open/collaborative nature of contributions provides the greatest opportunity for technological augmentation. We focus on open-source projects on GitHub by leveraging a natural experiment around the selective rollout of GitHub Copilot (a programming-focused LLM) in October 2021, where GitHub Copilot selectively supported programming languages like Python or Rust, but not R or Haskell. We observe a significant jump in overall contributions, suggesting that LLMs effectively augment collaborative innovation in an unguided setting. Interestingly, Copilot's launch increased iterative innovation focused on maintenance-related or feature-refining contributions significantly more than it did capability innovation through code-development or feature-introducing commits. This disparity was more pronounced after the model upgrade in June 2022 and was evident in active projects with extensive coding activity, suggesting that as both LLM capabilities and/or available contextual information improve, the gap between capability and iterative innovation may widen. We discuss practical and policy implications to incentivize high-value innovative solutions.

연구 동기 및 목표

대형 언어 모델이 조직적 맥락이 아닌 자발적, 자기 주도적 오픈 소스 협업에 어떤 영향을 미치는지에 대한 동기를 제시한다.
기여 유형을 인지적 요구 수준으로 구분한다: 실질적(새로운 기능) vs. 증가적(유지/정제).
Copilot 이용 가능성의 인과적 영향을 준(準실험) 설계와 다중 식별 전략을 사용해 식별한다.

제안 방법

Copilot의 2021년 출시로 생성된 외생적 분할(Python은 지원되고 R은 비지원되는 비즈니스 사유로 분리)을 활용한 자연 실험을 이용한다.
인과 효과를 추정하기 위해 세 가지 보완적 식별 전략을 적용한다.
두 가지 분류 접근법을 사용해 기여를 실질적 및 증가적으로 분류한다.
명세Across에서 전체 기여도와 각 기여 유형의 비율 증가를 추정한다.

실험 결과

연구 질문

RQ1Copilot 이용 가능성이 GitHub의 오픈 소스 기여를 인과적으로 증가시키는가?
RQ2증가적 기여가 실질적 기여보다 Copilot에 의해 더 영향을 받는가?
RQ3활동 수준과 모델 업그레이드가 기존 코드베이스의 활용과 새로운 기능의 탐색 사이의 균형을 어떻게 조정하는가?

주요 결과

Copilot 이용 가능성은 오픈 소스 기여를 28-40% 증가시킨다.
증가적 기여가 실질적 기여보다 모든 명세에서 더 많이 증가한다.
활동 수준이 높은 프로젝트에서 확폭 효과가 더 크고, 모델 업그레이드 이후에 더욱 확대된다.
LLM은 새로운 기능의 탐색보다는 기존 코드베이스의 활용에 협업적 혁신을 편향시킨다.
이 연구는 빠르게 움직이는 지식경제에서 LLM의 효과에 대한 인과적 현장 증거를 제공한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.