[論文レビュー] TinyLlama: An Open-Source Small Language Model
TinyLlama は、約3 trillion tokens (≈3 epochs) を用いて Llama 2 アーキテクチャとオープンソースの speedups を使用し、同規模の open-source モデルの中で強力な性能を達成する、コンパクトな 1.1B デコーダー専用言語モデルです。
We present TinyLlama, a compact 1.1B language model pretrained on around 1 trillion tokens for approximately 3 epochs. Building on the architecture and tokenizer of Llama 2, TinyLlama leverages various advances contributed by the open-source community (e.g., FlashAttention and Lit-GPT), achieving better computational efficiency. Despite its relatively small size, TinyLlama demonstrates remarkable performance in a series of downstream tasks. It significantly outperforms existing open-source language models with comparable sizes. Our model checkpoints and code are publicly available on GitHub at https://github.com/jzhang38/TinyLlama.
研究の動機と目的
- 小さな 1.1B モデルが非常に大規模な事前学習データで強力な性能を発揮できるかを評価する。
- オープンソースの効率性を活用してトレーニング速度とメモリ使用を改善する。
- TinyLlama を同規模の既存オープンソースモデルと commonsense および推論タスクで比較する。
- データ、コード、チェックポイントを公開することで openness と再現性を示す。
提案手法
- Llama 2-style decoder architecture and tokenizer.
- Pretrain on a mixture of SlimPajama natural language data and Starcoderdata code data (~950B tokens).
- Train for ~3 epochs with about 3 trillion tokens total.
- Apply speed/efficiency optimizations: Fully Sharded Data Parallel (FSDP), FlashAttention, xFormers adjustments, and grouped-query attention.
- Use RoPE positional embeddings and RMSNorm with SwiGLU activations.
- Pretraining follows autoregressive LM objective with AdamW, cosine lr schedule, warmup, and 2,000 warmup steps.
実験結果
リサーチクエスチョン
- RQ1Can a 1.1B parameter model achieve competitive performance when trained on an unusually large dataset (~3T tokens)?
- RQ2Do open-source efficiency improvements enable faster training and lower memory use without sacrificing performance?
- RQ3How does TinyLlama compare to other 1B-scale open-source models on commonsense reasoning and problem-solving benchmarks?
主な発見
- TinyLlama significantly outperforms OPT-1.3B and Pythia-1.4B on several downstream tasks in zero-shot evaluation.
- The model achieves competitive performance among similar-size open-source LMs on commonsense reasoning benchmarks (e.g., HellaSwag, OpenBookQA, WinoGrande, ARC, BoolQ, PIQA).
- Training with ~3T tokens and efficiency optimizations yields high throughput (≈24,000 tokens/s per A100-40G) and requires fewer GPU-hours than comparable models.
- TinyLlama demonstrates stronger problem-solving capabilities on InstructEval tasks (MMLU, BBH, HumanEval, DROP) than the baselines examined.
- The model remains open-source, with pretraining code, intermediate checkpoints, and data processing details released.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。