QUICK REVIEW

[論文レビュー] TinyLlama: An Open-Source Small Language Model

Peiyuan Zhang, Guangtao Zeng|arXiv (Cornell University)|Jan 4, 2024

Natural Language Processing Techniques被引用数 57

ひとこと要約

TinyLlama は、約3 trillion tokens (≈3 epochs) を用いて Llama 2 アーキテクチャとオープンソースの speedups を使用し、同規模の open-source モデルの中で強力な性能を達成する、コンパクトな 1.1B デコーダー専用言語モデルです。

ABSTRACT

We present TinyLlama, a compact 1.1B language model pretrained on around 1 trillion tokens for approximately 3 epochs. Building on the architecture and tokenizer of Llama 2, TinyLlama leverages various advances contributed by the open-source community (e.g., FlashAttention and Lit-GPT), achieving better computational efficiency. Despite its relatively small size, TinyLlama demonstrates remarkable performance in a series of downstream tasks. It significantly outperforms existing open-source language models with comparable sizes. Our model checkpoints and code are publicly available on GitHub at https://github.com/jzhang38/TinyLlama.

研究の動機と目的

小さな 1.1B モデルが非常に大規模な事前学習データで強力な性能を発揮できるかを評価する。
オープンソースの効率性を活用してトレーニング速度とメモリ使用を改善する。
TinyLlama を同規模の既存オープンソースモデルと commonsense および推論タスクで比較する。
データ、コード、チェックポイントを公開することで openness と再現性を示す。

提案手法

Llama 2-style decoder architecture and tokenizer.
Pretrain on a mixture of SlimPajama natural language data and Starcoderdata code data (~950B tokens).
Train for ~3 epochs with about 3 trillion tokens total.
Apply speed/efficiency optimizations: Fully Sharded Data Parallel (FSDP), FlashAttention, xFormers adjustments, and grouped-query attention.
Use RoPE positional embeddings and RMSNorm with SwiGLU activations.
Pretraining follows autoregressive LM objective with AdamW, cosine lr schedule, warmup, and 2,000 warmup steps.

実験結果

リサーチクエスチョン

RQ1Can a 1.1B parameter model achieve competitive performance when trained on an unusually large dataset (~3T tokens)?
RQ2Do open-source efficiency improvements enable faster training and lower memory use without sacrificing performance?
RQ3How does TinyLlama compare to other 1B-scale open-source models on commonsense reasoning and problem-solving benchmarks?

主な発見

TinyLlama significantly outperforms OPT-1.3B and Pythia-1.4B on several downstream tasks in zero-shot evaluation.
The model achieves competitive performance among similar-size open-source LMs on commonsense reasoning benchmarks (e.g., HellaSwag, OpenBookQA, WinoGrande, ARC, BoolQ, PIQA).
Training with ~3T tokens and efficiency optimizations yields high throughput (≈24,000 tokens/s per A100-40G) and requires fewer GPU-hours than comparable models.
TinyLlama demonstrates stronger problem-solving capabilities on InstructEval tasks (MMLU, BBH, HumanEval, DROP) than the baselines examined.
The model remains open-source, with pretraining code, intermediate checkpoints, and data processing details released.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。