QUICK REVIEW

[Paper Review] Applied Federated Learning: Improving Google Keyboard Query Suggestions

Timothy T. Yang, Galen Andrew|arXiv (Cornell University)|Dec 7, 2018

Privacy-Preserving Technologies in Data10 references449 citations

TL;DR

The paper demonstrates end-to-end use of federated learning to train, evaluate, and deploy a triggering model on mobile devices to filter Google Keyboard query suggestions without accessing raw user data, improving CTR while preserving privacy.

ABSTRACT

Federated learning is a distributed form of machine learning where both the training data and model training are decentralized. In this paper, we use federated learning in a commercial, global-scale setting to train, evaluate and deploy a model to improve virtual keyboard search suggestion quality without direct access to the underlying user data. We describe our observations in federated training, compare metrics to live deployments, and present resulting quality increases. In whole, we demonstrate how federated learning can be applied end-to-end to both improve user experiences and enhance user privacy.

Motivation & Objective

Demonstrate an end-to-end FL workflow for a commercial mobile keyboard feature.
Assess privacy benefits and performance of on-device FL training and aggregation.
Show how a triggering model can improve query suggestion quality without central data access.

Proposed method

Two-stage recommendation system: server-trained baseline model plus FL-trained triggering model.
On-device data collection of features and labels (impressions/clicks) for FL tasks.
Federated Averaging for aggregating client updates into global model without central data access.
On-device evaluation and monitoring to guide model convergence and deployment.
Threshold-based triggering to balance CTR and retained impressions.
Logistic regression as the FL model in initial experiments, with potential extension to neural models.

Experimental results

Research questions

RQ1Can federated learning on mobile devices improve the quality of Gboard's query suggestions without accessing raw user data?
RQ2What are the practical training dynamics, constraints, and privacy implications when deploying FL end-to-end in production?
RQ3How does the FL-trained triggering model affect click-through rate and retained impressions compared to a traditional baseline?
RQ4What challenges arise from diurnal device availability and population skew in FL for on-device privacy-preserving training?

Key findings

FL-trained triggering model improves click-through rate (CTR) compared to the baseline in live deployments at selected thresholds.
Training exhibits diurnal patterns: most rounds occur at night when devices are charging on unmetered networks.
Evaluation shows training and live metrics can diverge due to population skew and environmental constraints.
Threshold tuning affects the balance between triggering rate and user experience, influencing retained impressions and clicks.
Logistic regression provided an interpretable and effective starting point for FL in this setting; later iterations incorporated more complex features including LSTM-based text featurization.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.