QUICK REVIEW

[論文レビュー] CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production

Yixin Nie, Lin Guan|arXiv (Cornell University)|Mar 2, 2026

ICT in Developing Communities被引用数 0

ひとこと要約

CharacterFlywheel は、Meta のソーシャルアプリ across でのエンゲージメントと steerability を向上させる、反復的で生産規模のフライウィールを説明します。データ整備、報酬モデリング、SFT、RL、オフライン/オンライン評価を通じて、安定したオンラインエンゲージメントの向上とより良い steerability を達成します。

ABSTRACT

This report presents CharacterFlywheel, an iterative flywheel process for improving large language models (LLMs) in production social chat applications across Instagram, WhatsApp, and Messenger. Starting from LLaMA 3.1, we refined models across 15 generations using data from both internal and external real-user traffic. Through continuous deployments from July 2024 to April 2025, we conducted controlled 7-day A/B tests showing consistent engagement improvements: 7 of 8 newly deployed models demonstrated positive lift over the baseline, with the strongest performers achieving up to 8.8% improvement in engagement breadth and 19.4% in engagement depth. We also observed substantial gains in steerability, with instruction following increasing from 59.2% to 84.8% and instruction violations decreasing from 26.6% to 5.8%. We detail the CharacterFlywheel process which integrates data curation, reward modeling to estimate and interpolate the landscape of engagement metrics, supervised fine-tuning (SFT), reinforcement learning (RL), and both offline and online evaluation to ensure reliable progress at each optimization step. We also discuss our methods for overfitting prevention and navigating production dynamics at scale. These contributions advance the scientific rigor and understanding of LLMs in social applications serving millions of users.

研究の動機と目的

Instagram、WhatsApp、Messenger、Web におけるソーシャルチャット LLM のエンゲージメントの幅と深さを改善する。
データ整備、報酬モデリング、監視付き微調整、強化学習を統合したスケーラブルで反復的なワークフローを開発する。
本番展開におけるキャラクターの steerability を高め、安全性/承認違反を減らす。
オフラインおよびオンラインの評価手法を用いて反復的改良を導く。

提案手法

2024年7月から2025年4月までのデプロイを含む、15 世代のモデルを用いた反復的開発サイクル。
内部フィードバックと整備された本番データを組み合わせたデータパイプラインで学習セットを構築。
Bradley-Terry の好みモデル（点結合とペア結合）を含む報酬モデルと補助的なユーザー信号モデル。
RL 目的のためのポリシーに近いポストトレーニングデータを作成するリジェクションサンプリング。
Llama 3.1 70B 上における監督付き微調整（SFT）を実施後、DPO およびオンライン RL（GRPO 変種）を用いてエンゲージメント最適化。
表面的なスタイル特徴（長さ、絵文字の使用など）に過度に適合しないようアーティファクト緩和を実施。
オフライン評価をコミュニティベンチマークと人間比較で実施。オンライン A/B テストを10%のトラフィックで実施してエンゲージメントのリフトを測定。
安全性とプライバシーの管理には層状評価、フェイルクローズ設計、上流のプライバシーチェックを含む。
キャラクターの相互作用の一部として画像生成機能を含め、エンゲージメントを高める。

実験結果

リサーチクエスチョン

RQ1反復的で生産規模のフライウィールを用いて、ソーシャルチャット LLM のエンゲージメント指標を安定的に改善するにはどうすればよいか。
RQ2報酬モデリングと RL 戦略が、生産環境におけるエンゲージメントの幅/深さと steerability に与える影響は何か。
RQ3オフラインとオンラインの評価は、モデル選択とデプロイ決定をどのように導くべきか。
RQ4ソーシャルアプリで何百万ユーザーをスケールさせるために不可欠な安全性とプライバシーの機構は何か。
RQ5データ整備とリジェクションサンプリングは、表面的な手掛かりに過剰適合せずに学習に影響を与えるのにどう寄与するか。

主な発見

7/8 の新規デプロイモデルが、7日間の A/B テストで基準値を超えるエンゲージメントの幅と深さのリフトを示した。
最も強力なモデルは、エンゲージメントの幅で最大 8.8%、深さで 19.4% の改善を達成。
指示追従が 59.2% から 84.8%（ steerability）へ改善。
指示違反が 26.6% から 5.8%（ steerability）へ減少。
CharacterFlywheel の 15 世代モデルを 2024年1月から 2025年9月までに開発し、2024年7月29日に大規模公開展開を実施。
オフラインの報酬モデルの勝率とオンラインのエンゲージメント指標の両方を用いてデプロイ決定を導いた。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。