QUICK REVIEW

[论文解读] Self-Alignment with Instruction Backtranslation

Xian Li, Ping Yu|arXiv (Cornell University)|Aug 11, 2023

Natural Language Processing Techniques被引用 13

一句话总结

本文提出 instruction backtranslation，一种迭代自训练方法，使用种子模型从未标注的网页数据生成并筛选高质量的 (instruction, output) 对，达到在不进行模型蒸馏的情况下的强指令遵循能力。

ABSTRACT

We present a scalable method to build a high quality instruction following language model by automatically labelling human-written text with corresponding instructions. Our approach, named instruction backtranslation, starts with a language model finetuned on a small amount of seed data, and a given web corpus. The seed model is used to construct training examples by generating instruction prompts for web documents (self-augmentation), and then selecting high quality examples from among these candidates (self-curation). This data is then used to finetune a stronger model. Finetuning LLaMa on two iterations of our approach yields a model that outperforms all other LLaMa-based models on the Alpaca leaderboard not relying on distillation data, demonstrating highly effective self-alignment.

研究动机与目标

Motivate scalable instruction tuning without heavy reliance on human-annotated data or distillation.
Introduce a two-step self-training pipeline (self-augmentation and self-curation) driven by the model itself.
Demonstrate iterative improvement leading to competitive instruction-following models on benchmarks.
Show data quality control is essential for effective scaling of instruction-following models.

提出的方法

Initialize with a small seed set of (instruction, output) pairs and a large unlabelled web corpus.
Self-augmentation: fine-tune a backward model to generate candidate instructions for unlabelled outputs, creating (instruction, output) pairs.
Self-curation: use a seed instruction model to score augmented pairs and select high-quality examples for finetuning, iterating to build a stronger model.
Tag augmented and seed data with system prompts to guide training and inference.
Experiment with 7B, 33B, and 65B LLaMA models, and scale data across multiple augmentation iterations (two iterations of self-curation).
Evaluate via AlpacaEval (GPT-4 judgments) and human preferences, plus zero-shot NLP benchmarks.

实验结果

研究问题

RQ1Can a seed instruction-following model bootstrap high-quality instruction data from a large unlabelled web corpus without external supervision?
RQ2Does self-curation improve the quality of augmented data enough to warrant iterative retraining?
RQ3How does data quality vs. quantity impact instruction-following performance in self-aligned models?
RQ4What are the effects of data tagging and system prompts on training and inference?
RQ5How does the approach scale across model sizes and compare to non-distilled baselines on standard benchmarks?

主要发现

The self-augmentation plus self-curation pipeline (two iterations) yields a model (Humpback) that outperforms non-distilled LLaMA-based models on Alpaca leaderboard benchmarks.
Training on high-quality augmented data significantly improves instruction-following performance compared to using all augmented data or seed data alone.
Data quality emphasis yields better gains than merely increasing data volume, contrasting with the superficial alignment hypothesis.
Joint training of seed and self-augmented data with appropriate system prompts enhances performance and safety considerations.
Scaling to larger models (e.g., 65B) with high-quality augmented data yields further improvements over smaller models.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。