Skip to main content
QUICK REVIEW

[論文レビュー] Taming Preconditioner Drift: Unlocking the Potential of Second-Order Optimizers for Federated Learning on Non-IID Data

Junkang Liu, Fanhua Shang|arXiv (Cornell University)|Feb 22, 2026
Stochastic Gradient Optimization Techniques被引用数 0
ひとこと要約

本論文は、非IIDデータ上の連合学習における二階最適化の核心不安定性としてプリコンディショナーのドリフトを特定し、ローカルプリコンディショナーを整列・補正する FedPAC を導入して、視覚タスクと言語タスクの convergence をより速く安定させる。

ABSTRACT

Second-order optimizers can significantly accelerate large-scale training, yet their naive federated variants are often unstable or even diverge on non-IID data. We show that a key culprit is \emph{preconditioner drift}: client-side second-order training induces heterogeneous \emph{curvature-defined geometries} (i.e., preconditioner coordinate systems), and server-side model averaging updates computed under incompatible metrics, corrupting the global descent direction. To address this geometric mismatch, we propose exttt{FedPAC}, a \emph{preconditioner alignment and correction} framework for reliable federated second-order optimization. exttt{FedPAC} explicitly decouples parameter aggregation from geometry synchronization by: (i) extbf{Alignment} (i.e.,aggregating local preconditioners into a global reference and warm-starting clients via global preconditioner); and (ii) extbf{Correction} (i.e., steering local preconditioned updates using a global preconditioned direction to suppress long-term drift). We provide drift-coupled non-convex convergence guarantees with linear speedup under partial participation. Empirically, exttt{FedPAC} consistently improves stability and accuracy across vision and language tasks, achieving up to $5.8\%$ absolute accuracy gain on CIFAR-100 with ViTs. Code is available at https://anonymous.4open.science/r/FedPAC-8B24.

研究の動機と目的

  • Identify the cause of instability in federated second-order optimization on non-IID data (preconditioner drift).
  • Propose a unified framework FedPAC to align global and local preconditioners while correcting updates.
  • Provide convergence guarantees and empirical evidence of improved stability and accuracy across vision and language tasks.

提案手法

  • Define and measure preconditioner drift as the discrepancy between local and global preconditioners across clients.
  • Propose FedPAC to decouple geometry synchronization (alignment) from parameter aggregation (correction).
  • Instantiate FedPAC on top of SOAP, Muon, and Sophia to yield FedPAC_Sophia, FedPAC_Muon, and FedPAC_SOAP.
  • Implement alignment by aggregating local preconditioners into a global reference and warm-starting clients with it.
  • Implement correction by combining a local preconditioned update with a global direction using a trade-off parameter beta.
  • Provide drift-coupled non-convex convergence guarantees showing reduced drift term and faster convergence.]
  • research_questions: ["Can preconditioner drift explain why naive federated second-order methods underperform on non-IID data?", "Does FedPAC effectively align and correct local preconditioners to recover or exceed first-order FL performance in heterogeneous settings?", "What are the convergence guarantees for FedSOA and FedPAC under standard smoothness and bounded-heterogeneity assumptions?", "How do FedPAC variants perform across CNNs, Vision Transformers, and language models in IID vs non-IID regimes?", "What is the role of the correction strength beta in FedPAC performance?"]
  • key_findings: ["Second-order federated methods suffer from preconditioner drift that degrades global convergence on non-IID data.", "FedPAC reduces preconditioner drift and yields faster, more stable convergence across CNNs, ViTs, and language models.", "FedPAC variants consistently improve accuracy over local second-order baselines, especially under strong data heterogeneity (Dirichlet partitions).", "FedPAC achieves up to significant gains over baselines on CIFAR-100 and Tiny-ImageNet, and shows strong performance in C4 pre-training with LLaMA models.", "Theoretical results show drift-coupled convergence guarantees, with FedPAC eliminating the explicit heterogeneity term and reducing drift-related noise.", "Ablation studies confirm both alignment and correction are necessary, and beta around 0.5 provides robust performance."]
  • table_headers:[]
  • table_rows:[]}
  • table_headers translated:
  • table_rows translated:
Figure 1 : (a) In non-IID FL, first-order methods converge slowly, inducing little client drift. (b) Second-order methods converge faster locally and thus drift toward local optima, causing the aggregated global model to deviate from global optimum. (c) FedPAC corrects local second-order updates, yi
Figure 1 : (a) In non-IID FL, first-order methods converge slowly, inducing little client drift. (b) Second-order methods converge faster locally and thus drift toward local optima, causing the aggregated global model to deviate from global optimum. (c) FedPAC corrects local second-order updates, yi

実験結果

リサーチクエスチョン

  • RQ1非IIDデータでの連合二階最適化が naive な方法の性能不足をプリコンディショナー・ドリフトで説明できるか?
  • RQ2FedPAC は局所プリコンディショナーを効果的に整列・補正して、異質環境において一階FLの性能を再現または上回るか?
  • RQ3標準的な滑らかさと制限されたヘテロジェニティの仮定の下で、FedSOA と FedPAC の収束保証は何か?
  • RQ4FedPAC のバリアントは IID 対非 IID 条件で CNN、Vision Transformer、言語モデルの性能にどう影響するか?
  • RQ5FedPAC の補正強度 beta の役割は何か?

主な発見

  • 第二次連合学習法はプリコンディショナー・ドリフトにより非 IID データで global 収束が劣化する。
  • FedPAC はプリコンディショナー・ドリフトを低減し、CNN、ViT、言語モデルの収束をより速く、安定させる。
  • FedPAC のバリアントは、特にデータの高度な異質性(Dirichlet分割)の下で、局所的な二階最適化ベースラインより一貫して精度を改善する。
  • FedPAC は CIFAR-100 と Tiny-ImageNet でベースラインより顕著な利得を得ており、LLaMA モデルを用いた C4 事前学習でも高い性能を示す。
  • 理論的結果はドリフト結合型の収束保証を示し、明示的なヘテロ性項を排除し、ドリフトに伴うノイズを減少させる。
  • アブレーション研究は整列と補正の両方が必要であり、beta が約 0.5 前後だと堅牢な性能を提供することを確認している。
(a) ResNet-18, IID
(a) ResNet-18, IID

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。