Skip to main content
QUICK REVIEW

[论文解读] CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration

Keming Ye, Zhou Zhao|arXiv (Cornell University)|Mar 26, 2026
Generative Adversarial Networks and Image Synthesis被引用 0
一句话总结

CIAR introduces an on-device interval-based uncertainty quantifier and cloud-enhanced decoding to accelerate autoregressive image generation, achieving about 2.18× speed-up and 70% fewer cloud requests while preserving image quality.

ABSTRACT

Auto-regressive (AR) models have recently made notable progress in image generation, achieving performance comparable to diffusion-based approaches. However, their computational intensity and sequential nature impede on-device deployment, causing disruptive latency. We address this via a cloud-device collaboration framework extbf{CIAR}, which utilizes on-device self-verification to handle two key properties of visual synthesis: extit{the vast token vocabulary} required for high-fidelity images and extit{inherent spatial redundancy} which leads to extreme predictability in homogeneous regions, while object boundaries exhibit high uncertainty. Uniform verification wastes resources on such redundant tokens. Our solution centers on an on-device token uncertainty quantifier, which adopts continuous probability intervals to accelerate processing and make it feasible for large visual vocabularies instead of conventional discrete solution sets. Additionally, we incorporate a Interval-enhanced decoding module to further speed up decoding while maintaining visual fidelity and semantic consistency via a distribution alignment training strategy. Extensive experiments demonstrate that CIAR achieves a 2.18x speed-up and reduces cloud requests by 70\%, while preserving image quality compared to existing methods.

研究动机与目标

  • Motivate on-device acceleration for high-fidelity visual AR models with large token vocabularies and spatial redundancy.
  • Develop an interval-based uncertainty quantifier (Inter-Head) to selectively verify tokens and reduce unnecessary cloud communication.
  • Design interval-enhanced cloud decoding and a distribution alignment training strategy to maintain coherence between device and cloud outputs.
  • Demonstrate speedups and reduced cloud usage without sacrificing visual fidelity on standard benchmarks.

提出的方法

  • Propose on-device Interval Head (Inter-Head) that outputs center and radius logits to form probability intervals for each token.
  • Define a probability interval p_t^l, p_t^u and an interval-based uncertainty score that combines total interval width and dispersion.
  • Introduce Cloud-Enhanced decoding with prefix injection and intervalFeature conditioning to align device and cloud distributions during decoding.
  • Adopt an interval-aware Distributionally Robust Optimization (Inter-DRO) loss to train the Inter-Head for distribution alignment with the cloud model.
  • Implement interval feature projection to conditioning the cloud decoder, reducing drift and improving coherence.
  • Conduct extensive experiments on multiple cloud models (LlamaGen-XL stages I/II, Anole) with MS-COCO captions as prompts.
Figure 1: (a) Acceptance analysis of Lantern. The pie chart shows the ratio of max-prob vs. other tokens, and the bar chart compares Lantern without verification to the baseline. (b) Comparison of decoding frameworks. From left to right: baseline, Lantern, and our CIAR with Inter-Head and cloud-devi
Figure 1: (a) Acceptance analysis of Lantern. The pie chart shows the ratio of max-prob vs. other tokens, and the bar chart compares Lantern without verification to the baseline. (b) Comparison of decoding frameworks. From left to right: baseline, Lantern, and our CIAR with Inter-Head and cloud-devi

实验结果

研究问题

  • RQ1How can interval-based uncertainty estimation on-device reduce redundant verification in cloud-device AR image generation?
  • RQ2Can interval-enhanced decoding with distribution alignment maintain image fidelity while reducing cloud interactions?
  • RQ3What is the trade-off between prefix guidance rate and latency when using cloud-prefix injection in CIAR?
  • RQ4How does continuous interval-based uncertainty compare to discrete solution enumeration in terms of latency and quality for large token vocabularies?

主要发现

MetricModelsMethodsCLIP (↑)FID (↓)F1(↑)HPSv2(↑)Latency(s)stepsCloud Call
BaseLlamaGen(Stage I)Base0.316123.69000.609722.74x1.00x1.00100.00%
Eagle2LlamaGen(Stage I)Ours0.315924.24590.599722.48x2.53x3.0030.44%
LanternLlamaGen(Stage I)Ours0.315924.58280.579622.03x1.70x2.0552.34%
Entropy-LensLlamaGen(Stage I)Ours0.313224.2459?0.5997?22.48x2.53x3.0030.44%
CoDe (N = 0.3)LlamaGen(Stage I)Ours0.282240.07090.535023.84x1.00x1.00100.00%
LlamaGen(Stage I)Ours0.315924.24590.599722.48x2.53x3.0030.44%
BaseLlamaGen(Stage II)Base0.282240.07090.535023.84x1.00x1.00100.00%
Eagle2LlamaGen(Stage II)Ours0.315923.71030.611722.88x1.02x1.1984.55%
LanternLlamaGen(Stage II)Ours0.318123.95100.596922.92x1.25x1.8150.35%
Entropy-LensLlamaGen(Stage II)Ours0.296632.35330.560022.34x1.57x2.5339.86%
CoDe (N = 0.3)LlamaGen(Stage II)Ours0.278136.75200.559721.94x1.55x2.8930.00%
AnoleAnoleOurs0.317123.85930.597023.14x1.87x3.2929.88%
BaseAnoleBase0.321519.94550.654423.52x1.00x1.00100.00%
Eagle2AnoleOurs0.315923.71030.611722.88x1.02x1.0991.98%
LanternAnoleOurs0.318123.95100.596922.92x1.25x1.8150.35%
Entropy-LensAnoleOurs0.296632.35330.560022.34x1.57x2.5339.86%
CoDe (N = 0.3)AnoleOurs0.278136.75200.559721.94x1.55x2.8930.00%
  • CIAR achieves a 2.18× speed-up and reduces cloud requests by 70% versus state-of-the-art speculative decoding methods.
  • CIAR maintains or improves visual fidelity metrics (CLIP, FID, F1, HPSv2) across evaluated models.
  • The Inter-Head interval-based uncertainty provides better balance between local token acceptance and cloud offloading than entropy-based or random baselines.
  • Interval-enhanced decoding with interval feature conditioning sustains distribution alignment and improves detail coherence.
  • A prefix injection strategy reduces unnecessary cloud requests while preserving image quality, with an optimal prefix rate balancing guidance and latency.
Figure 2: Overview of CIAR. (a) The cloud-side AR model generates image token prefixes from the input prompt. These prefixes are then sent to (b) a lightweight device model with Inter-Head accepts confident tokens locally and sends uncertain ones with interval features to the cloud for verification
Figure 2: Overview of CIAR. (a) The cloud-side AR model generates image token prefixes from the input prompt. These prefixes are then sent to (b) a lightweight device model with Inter-Head accepts confident tokens locally and sends uncertain ones with interval features to the cloud for verification

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。