QUICK REVIEW

[論文レビュー] Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning

Hao Chen, Yiming Zhang|arXiv (Cornell University)|May 16, 2023

Topic Modeling被引用数 8

ひとこと要約

この論文は Low Training Data Instruction Tuning (LTD Instruction Tuning) を調査し、タスク固有の LLM モデルがタスク関連データの0.5%未満を用いて競争力のある性能を達成できることを示しており、単一指示と複数指示の違い、および NLI タスクのデータ効率性に関する顕著な知見を示す。

ABSTRACT

Instruction tuning for large language models (LLMs) has gained attention from researchers due to its ability to unlock the potential of LLMs in following instructions. While instruction tuning offers advantages for facilitating the adaptation of large language models (LLMs) to downstream tasks as a fine-tuning approach, training models with tens of millions or even billions of parameters on large amounts of data results in unaffordable computational costs. To address this, we focus on reducing the data used in LLM instruction tuning to decrease training costs and improve data efficiency, dubbed as Low Training Data Instruction Tuning (LTD Instruction Tuning). Specifically, this paper conducts a preliminary exploration into reducing the data used in LLM training and identifies several observations regarding task specialization for LLM training, such as the optimization of performance for a specific task, the number of instruction types required for instruction tuning, and the amount of data required for task-specific models. The results suggest that task-specific models can be trained using less than 0.5% of the original dataset, with a 2% improvement in performance over those trained on full task-related data.

研究の動機と目的

大規模言語モデルのトレーニングコストを削減するためのデータ効率的な指示チューニングを動機づける。
タスク固有のモデルを非常に小さく、タスクに焦点を当てたデータセットで訓練できるかを調査する。
効果的なタスク特化のために必要な指示タイプの数とデータ量を明らかにする。
効率的な指示チューニングのためにコアサンプルを取得するデータ選択手法を提案する。

提案手法

データを指示スタイルのプロンプトに整形し、文を埋め込みへエンコードする。
K-means で埋め込みをクラスタリングし、タスク中心の分布中心を特定する。
コアセット手法（KCenterGreedy）を用いて小さく代表的なタスクデータを選択する。
タスクプールからコアサンプルを取得して Galactica-1.3b を NLI 用にファインチューニングする。
回答選択肢のトークン確率の積を計算して最良の選択肢を選ぶことで評価する。
サンプリング戦略をアブレーションして、コアセット選択と類似性ベースの手法を比較する。

実験結果

リサーチクエスチョン

RQ1全タスク関連データセットの0.5%未満でタスク固有の LLM を効果的に訓練できるか。
RQ2 LTD 指示チューニング下で指示タイプの数と多様性がタスク固有の性能にどう影響するか。
RQ3効率的な指示チューニングのために最も代表的なコアサンプルを得るデータ選択戦略は何か。
RQ4NLI では単一指示で最適なタスク固有性能を得られるのか、それとも複数指示が僅かな改善をもたらすのか。

主な発見

Model	RTE	CB	ANLI R1	ANLI R2	ANLI R3	Avg.
Vanilla Model (0%)	54.51	41.07	33.40	33.40	33.58	39.19
P3 (100%)	76.17	75.00	44.00	35.70	39.42	54.06
Fixed Instruction (10%)	71.11	66.07	43.60	38.90	42.17	52.37
NLI-related (5%)	79.06	82.14	60.40	46.50	46.67	62.95
NLI coreset (0.5%)	74.73	73.21	49.60	41.90	43.75	56.64

タスク固有のモデルは元のデータセットの0.5%未満で訓練でき、全タスク関連データで訓練したモデルと同等かそれ以上の性能を達成できる。
NLI ではターゲットタスクのデータと単一指示のみで、全てのマルチ指示 P3 設定と同等またはそれを上回る結果を得られる。
指示フォーマットの多様性はタスク固有の性能に与える影響が限定的であり、タスクに焦点を当てたデータで十分な場合がある。
コアセットベースのサンプリング手法（KCenterGreedy）は、素のベースラインと比較して強い改善をもたらす（例: 0.5% データ）。
16k インスタンス（1.9M トークン、P3 の0.5%）を用いて NLI タスク固有モデルを訓練するのに十分である。
コサンプリング戦略（cosine 類似性に基づく topK/leastK/mixed）はコアセット手法より劣る傾向があり、代表的なコアサンプルの重要性を示唆する。
素のモデルと比較して、NLI コアセットは RTE、CB、ANLI R1-R3 で平均精度が高くなる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。