QUICK REVIEW

[논문 리뷰] CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration

Keming Ye, Zhou Zhao|arXiv (Cornell University)|2026. 03. 26.

Generative Adversarial Networks and Image Synthesis인용 수 0

한 줄 요약

CIAR은 온-device 간격 기반 불확실성 정량기와 클라우드 강화 디코딩으로 자기회귀 이미지 생성을 가속화하여 약 2.18배 속도 향상 및 클라우드 요청 70% 감소를 달성하면서 이미지 품질을 유지합니다.

ABSTRACT

Auto-regressive (AR) models have recently made notable progress in image generation, achieving performance comparable to diffusion-based approaches. However, their computational intensity and sequential nature impede on-device deployment, causing disruptive latency. We address this via a cloud-device collaboration framework extbf{CIAR}, which utilizes on-device self-verification to handle two key properties of visual synthesis: extit{the vast token vocabulary} required for high-fidelity images and extit{inherent spatial redundancy} which leads to extreme predictability in homogeneous regions, while object boundaries exhibit high uncertainty. Uniform verification wastes resources on such redundant tokens. Our solution centers on an on-device token uncertainty quantifier, which adopts continuous probability intervals to accelerate processing and make it feasible for large visual vocabularies instead of conventional discrete solution sets. Additionally, we incorporate a Interval-enhanced decoding module to further speed up decoding while maintaining visual fidelity and semantic consistency via a distribution alignment training strategy. Extensive experiments demonstrate that CIAR achieves a 2.18x speed-up and reduces cloud requests by 70\%, while preserving image quality compared to existing methods.

연구 동기 및 목표

고해상도 시각 AR 모델의 대용량 토큰 어휘 및 공간 중복을 갖춘 로컬 가속의 필요성을 동기부여한다.
선택적으로 토큰을 검증하고 불필요한 클라우드 통신을 줄이기 위해 Interval-Based Uncertainty Quantifier(Inter-Head)를 개발한다.
장치와 클라우드 출력 간의 일관성을 유지하기 위해 간격 강화 클라우드 디코딩 및 분포 정렬 학습 전략을 설계한다.
표준 벤치마크에서 시각적 충실도 손실 없이 속도향상 및 클라우드 사용량 감소를 입증한다.

제안 방법

각 토큰에 대해 확률 간격을 형성하는 중심 및 반경 로짓을 출력하는 온-device Interval Head(Inter-Head) 제안
전체 간격 너비와 분산을 결합하는 확률 간격 p_t^l, p_t^u 및 간격 기반 불확실성 점수 정의
프리픽스 주입 및 intervalFeature 조건화를 통한 디바이스와 클라우드 분포를 디코딩 중에 일치시키는 Cloud-Enhanced 디코딩 도입
Inter-Head를 클라우드 모델과의 분포 정합을 위한 간격 인식 Distributionally Robust Optimization(Inter-DRO) 손실로 학습
간격 특징 프로젝션을 도입해 클라우드 디코더를 조건화하고 드리프트를 줄이며 일관성 향상
MS-COCO 캡션을 프롬프트로 사용한 다중 클라우드 모델(LlamaGen-XL 단계 I/II, Anole)에서 광범위한 실험 수행

Figure 1: (a) Acceptance analysis of Lantern. The pie chart shows the ratio of max-prob vs. other tokens, and the bar chart compares Lantern without verification to the baseline. (b) Comparison of decoding frameworks. From left to right: baseline, Lantern, and our CIAR with Inter-Head and cloud-devi

실험 결과

연구 질문

RQ1온-device에서 간격 기반 불확실성 추정이 클라우드-장치 AR 이미지 생성의 중복 검증 감소에 어떻게 기여할 수 있는가?
RQ2분포 정렬이 있는 간격 강화 디코딩이 클라우드 상호 작용을 줄이면서 이미지 충실도를 유지할 수 있는가?
RQ3CIAR에서 클라우드 프리픽스 주입 시 프리픽스 가이드 레이트와 지연 시간 간의 trade-off는 무엇인가?
RQ4연속 간격 기반 불확실성이 대규모 토큰 어휘에서 이산 해법 열거에 비해 지연 시간과 품질에 어떤 차이가 있는가?

주요 결과

측정지표	모델	방법	CLIP (↑)	FID (↓)	F1(↑)	HPSv2(↑)	지연(초)	스텝	클라우드 호출
Base	LlamaGen(Stage I)	Base	0.3161	23.6900	0.6097	22.74	x1.00	x1.00	100.00%
Eagle2	LlamaGen(Stage I)	Ours	0.3159	24.2459	0.5997	22.48	x2.53	x3.00	30.44%
Lantern	LlamaGen(Stage I)	Ours	0.3159	24.5828	0.5796	22.03	x1.70	x2.05	52.34%
Entropy-Lens	LlamaGen(Stage I)	Ours	0.3132	24.2459?	0.5997?	22.48	x2.53	x3.00	30.44%
CoDe (N = 0.3)	LlamaGen(Stage I)	Ours	0.2822	40.0709	0.5350	23.84	x1.00	x1.00	100.00%
LlamaGen(Stage I)	Ours	0.3159	24.2459	0.5997	22.48	x2.53	x3.00	30.44%
Base	LlamaGen(Stage II)	Base	0.2822	40.0709	0.5350	23.84	x1.00	x1.00	100.00%
Eagle2	LlamaGen(Stage II)	Ours	0.3159	23.7103	0.6117	22.88	x1.02	x1.19	84.55%
Lantern	LlamaGen(Stage II)	Ours	0.3181	23.9510	0.5969	22.92	x1.25	x1.81	50.35%
Entropy-Lens	LlamaGen(Stage II)	Ours	0.2966	32.3533	0.5600	22.34	x1.57	x2.53	39.86%
CoDe (N = 0.3)	LlamaGen(Stage II)	Ours	0.2781	36.7520	0.5597	21.94	x1.55	x2.89	30.00%
Anole	Anole	Ours	0.3171	23.8593	0.5970	23.14	x1.87	x3.29	29.88%
Base	Anole	Base	0.3215	19.9455	0.6544	23.52	x1.00	x1.00	100.00%
Eagle2	Anole	Ours	0.3159	23.7103	0.6117	22.88	x1.02	x1.09	91.98%
Lantern	Anole	Ours	0.3181	23.9510	0.5969	22.92	x1.25	x1.81	50.35%
Entropy-Lens	Anole	Ours	0.2966	32.3533	0.5600	22.34	x1.57	x2.53	39.86%
CoDe (N = 0.3)	Anole	Ours	0.2781	36.7520	0.5597	21.94	x1.55	x2.89	30.00%

CIAR은 상태-of-the-art 추정 디코딩 방법에 비해 2.18× 속도 향상 및 클라우드 요청을 70% 줄입니다.
CIAR은 평가된 모델들에서 시각적 충실도 지표(CLIP, FID, F1, HPSv2)를 유지하거나 향상시킵니다.
Inter-Head의 간격 기반 불확실성은 엔트로피 기반이나 무작위 베이스라인보다 로컬 토큰 수용과 클라우드 오프로드의 균형을 더 잘 제공합니다.
간격 특징 조건화를 통한 간격 강화 디코딩은 분포 정합을 유지하고 세부 연속성을 향상시킵니다.
프리픽스 주입 전략은 필요 없는 클라우드 요청을 줄이면서 이미지 품질을 보전하며 가이던스와 지연 시간의 최적 프리픽스 비율을 제공합니다.

Figure 2: Overview of CIAR. (a) The cloud-side AR model generates image token prefixes from the input prompt. These prefixes are then sent to (b) a lightweight device model with Inter-Head accepts confident tokens locally and sends uncertain ones with interval features to the cloud for verification

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.