QUICK REVIEW

[论文解读] CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration

Keming Ye, Zhou Zhao|arXiv (Cornell University)|Mar 26, 2026

Generative Adversarial Networks and Image Synthesis被引用 0

一句话总结

CIAR introduces an on-device interval-based uncertainty quantifier and cloud-enhanced decoding to accelerate autoregressive image generation, achieving about 2.18× speed-up and 70% fewer cloud requests while preserving image quality.

ABSTRACT

Auto-regressive (AR) models have recently made notable progress in image generation, achieving performance comparable to diffusion-based approaches. However, their computational intensity and sequential nature impede on-device deployment, causing disruptive latency. We address this via a cloud-device collaboration framework extbf{CIAR}, which utilizes on-device self-verification to handle two key properties of visual synthesis: extit{the vast token vocabulary} required for high-fidelity images and extit{inherent spatial redundancy} which leads to extreme predictability in homogeneous regions, while object boundaries exhibit high uncertainty. Uniform verification wastes resources on such redundant tokens. Our solution centers on an on-device token uncertainty quantifier, which adopts continuous probability intervals to accelerate processing and make it feasible for large visual vocabularies instead of conventional discrete solution sets. Additionally, we incorporate a Interval-enhanced decoding module to further speed up decoding while maintaining visual fidelity and semantic consistency via a distribution alignment training strategy. Extensive experiments demonstrate that CIAR achieves a 2.18x speed-up and reduces cloud requests by 70\%, while preserving image quality compared to existing methods.

研究动机与目标

Motivate on-device acceleration for high-fidelity visual AR models with large token vocabularies and spatial redundancy.
Develop an interval-based uncertainty quantifier (Inter-Head) to selectively verify tokens and reduce unnecessary cloud communication.
Design interval-enhanced cloud decoding and a distribution alignment training strategy to maintain coherence between device and cloud outputs.
Demonstrate speedups and reduced cloud usage without sacrificing visual fidelity on standard benchmarks.

提出的方法

Propose on-device Interval Head (Inter-Head) that outputs center and radius logits to form probability intervals for each token.
Define a probability interval p_t^l, p_t^u and an interval-based uncertainty score that combines total interval width and dispersion.
Introduce Cloud-Enhanced decoding with prefix injection and intervalFeature conditioning to align device and cloud distributions during decoding.
Adopt an interval-aware Distributionally Robust Optimization (Inter-DRO) loss to train the Inter-Head for distribution alignment with the cloud model.
Implement interval feature projection to conditioning the cloud decoder, reducing drift and improving coherence.
Conduct extensive experiments on multiple cloud models (LlamaGen-XL stages I/II, Anole) with MS-COCO captions as prompts.

Figure 1: (a) Acceptance analysis of Lantern. The pie chart shows the ratio of max-prob vs. other tokens, and the bar chart compares Lantern without verification to the baseline. (b) Comparison of decoding frameworks. From left to right: baseline, Lantern, and our CIAR with Inter-Head and cloud-devi

实验结果

研究问题

RQ1How can interval-based uncertainty estimation on-device reduce redundant verification in cloud-device AR image generation?
RQ2Can interval-enhanced decoding with distribution alignment maintain image fidelity while reducing cloud interactions?
RQ3What is the trade-off between prefix guidance rate and latency when using cloud-prefix injection in CIAR?
RQ4How does continuous interval-based uncertainty compare to discrete solution enumeration in terms of latency and quality for large token vocabularies?

主要发现

Metric	Models	Methods	CLIP (↑)	FID (↓)	F1(↑)	HPSv2(↑)	Latency(s)	steps	Cloud Call
Base	LlamaGen(Stage I)	Base	0.3161	23.6900	0.6097	22.74	x1.00	x1.00	100.00%
Eagle2	LlamaGen(Stage I)	Ours	0.3159	24.2459	0.5997	22.48	x2.53	x3.00	30.44%
Lantern	LlamaGen(Stage I)	Ours	0.3159	24.5828	0.5796	22.03	x1.70	x2.05	52.34%
Entropy-Lens	LlamaGen(Stage I)	Ours	0.3132	24.2459?	0.5997?	22.48	x2.53	x3.00	30.44%
CoDe (N = 0.3)	LlamaGen(Stage I)	Ours	0.2822	40.0709	0.5350	23.84	x1.00	x1.00	100.00%
LlamaGen(Stage I)	Ours	0.3159	24.2459	0.5997	22.48	x2.53	x3.00	30.44%
Base	LlamaGen(Stage II)	Base	0.2822	40.0709	0.5350	23.84	x1.00	x1.00	100.00%
Eagle2	LlamaGen(Stage II)	Ours	0.3159	23.7103	0.6117	22.88	x1.02	x1.19	84.55%
Lantern	LlamaGen(Stage II)	Ours	0.3181	23.9510	0.5969	22.92	x1.25	x1.81	50.35%
Entropy-Lens	LlamaGen(Stage II)	Ours	0.2966	32.3533	0.5600	22.34	x1.57	x2.53	39.86%
CoDe (N = 0.3)	LlamaGen(Stage II)	Ours	0.2781	36.7520	0.5597	21.94	x1.55	x2.89	30.00%
Anole	Anole	Ours	0.3171	23.8593	0.5970	23.14	x1.87	x3.29	29.88%
Base	Anole	Base	0.3215	19.9455	0.6544	23.52	x1.00	x1.00	100.00%
Eagle2	Anole	Ours	0.3159	23.7103	0.6117	22.88	x1.02	x1.09	91.98%
Lantern	Anole	Ours	0.3181	23.9510	0.5969	22.92	x1.25	x1.81	50.35%
Entropy-Lens	Anole	Ours	0.2966	32.3533	0.5600	22.34	x1.57	x2.53	39.86%
CoDe (N = 0.3)	Anole	Ours	0.2781	36.7520	0.5597	21.94	x1.55	x2.89	30.00%

CIAR achieves a 2.18× speed-up and reduces cloud requests by 70% versus state-of-the-art speculative decoding methods.
CIAR maintains or improves visual fidelity metrics (CLIP, FID, F1, HPSv2) across evaluated models.
The Inter-Head interval-based uncertainty provides better balance between local token acceptance and cloud offloading than entropy-based or random baselines.
Interval-enhanced decoding with interval feature conditioning sustains distribution alignment and improves detail coherence.
A prefix injection strategy reduces unnecessary cloud requests while preserving image quality, with an optimal prefix rate balancing guidance and latency.

Figure 2: Overview of CIAR. (a) The cloud-side AR model generates image token prefixes from the input prompt. These prefixes are then sent to (b) a lightweight device model with Inter-Head accepts confident tokens locally and sends uncertain ones with interval features to the cloud for verification

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。