QUICK REVIEW

[論文レビュー] Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Mayank Mishra, Matt Stallone|arXiv (Cornell University)|May 7, 2024

Software Engineering Research被引用数 10

ひとこと要約

The Granite Code Modelsは、116のプログラミング言語で訓練された、オープンなデコーダー専用のコードLLMのファミリー（3B–34B）であり、コード生成、修正、説明などを含む広範な機能においてオープンモデルとして高い性能を発揮し、研究および商用利用のために Apache 2.0 の下で公開されています。

ABSTRACT

Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabilities, including code generation, fixing bugs, explaining and documenting code, maintaining repositories, and more. In this work, we introduce the Granite series of decoder-only code models for code generative tasks, trained with code written in 116 programming languages. The Granite Code models family consists of models ranging in size from 3 to 34 billion parameters, suitable for applications ranging from complex application modernization tasks to on-device memory-constrained use cases. Evaluation on a comprehensive set of tasks demonstrates that Granite Code models consistently reaches state-of-the-art performance among available open-source code LLMs. The Granite Code model family was optimized for enterprise software development workflows and performs well across a range of coding tasks (e.g. code generation, fixing and explanation), making it a versatile all around code model. We release all our Granite Code models under an Apache 2.0 license for both research and commercial use.

研究の動機と目的

効率的でエンタープライズ向けのコードLLMが、コード生成を超えた広範な機能を持つ必要性を動機づける。
3B、8B、20B、34Bパラメータを含むOpen Granite Codeモデルのファミリー（BaseとInstruct）を提示する。
データ収集、モデルアーキテクチャ、トレーニング/指示チューニング、さまざまなコーディングタスクにわたる評価を説明する。

提案手法

116言語のコード+テキストトークンで3.5–4.5Tの2段階の事前学習（フェーズ1: コードのみ; フェーズ2: コード+言語データ）
前正規化とモデル固有の選択（RoPE, GQA/MQA, swiglu, RMSNorm/LN）を用いたデコーダー専用トランスフォーマーアーキテクチャ、サイズに応じて
Fill-In-the-Middle (FIM) objectiveを含むCaudal言語モデリング目的と混合損失L = alpha*L_CLM + (1-alpha)*L_FIM、pretraining中のalpha=0.5、instruction finetuning中のalpha=1
CommitPack-filteredデータ、NL-指示データセット、MathInstruct/MetaMathQA、および合成コードデータセットを用いた指示チューニングにより推論と指示遵守を向上
多言語にわたるベンチマーク（HumanEvalPack、MBPP(+)、RepoBench、ReCodeなど）、多言語・推論重視の評価、オープンソースのコードLLMとの比較

Figure 1: Comparison of Granite-8B-Code (Base/Instruct) with other open source (code) LLMs of similar size on HumanEvalPack (Muennighoff et al., 2023 ) , spanning 3 coding tasks and 6 programming languages. See Tables 3 , 10 , 11 for more details. Best viewed in color.

実験結果

リサーチクエスチョン

RQ1Granite Codeモデルは、オープンソースモデルの中で、生成、修正、説明、編集、翻訳などの幅広いコード関連タスクにおいて最先端または競争力のある性能を達成できるか。
RQ2Baseと指示チューニングされたGranite Codeモデルは、Pythonコード以外の複数の言語とベンチマークでどの程度性能を発揮するか。
RQ3信頼できるライセンス準拠のエンタープライズ利用を可能にするデータ収集、フィルタリング、トレーニング戦略は何か。
RQ42段階のトレーニングと指示チューニングはコードタスクの推論力・問題解決能力を向上させるか。
RQ5Granite Codeモデルは、コード関連タスクでより大規模な一般目的のオープンLLMと比べてどうか。

主な発見

Granite-8B-Code-Baseは、HumanEvalPackで競合するCodeGemma-8Bを約12ポイント上回る（33.2%対21.3%）一方で学習トークンは少ない（4.5T対7.5T）。
Granite-8B-Code-BaseはCodeFix/CodeExplainの性能が高く、HumanEvalPackとMultiPL-Eの言語横断で競争力のある結果を示す。
指示チューニングされたGranite Codeモデルは同等サイズのCodeLlama指示モデルを上回り、Granite-3B/8B/20Bが複数の設定でより大きなCodeLlamaバリアントを凌ぐ。
HumanEvalSynthesize（6言語）では、Granite-3B-Code-Base/8B/20B-Baseがベースモデルの中でトップの性能を達成；場合によっては3B-Instructでも大きなCodeLlama-Instructモデルを上回ることがある。
MultiPL-E（18言語）全体で、Granite-8B-Code-BaseはCodeLlama-7Bを16/18言語で上回り、Granite-34B-Code-Baseは多くの言語でCodeLlama-34Bを上回ることが多い。
MBPP/MBPP+の結果はGranite-8B-Baseが競争力を持ち、20B/34Bバリアントは同等の同業他社と比較して強力なスコアを達成。

Figure 2: An overview of depth upscaling (Kim et al., 2024 ) for efficient training of Granite-34B-Code. We utilize the 20B model after 1.6T tokens to start training of 34B model with the same code pretraining data without any changes to the training and inference framework.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。