QUICK REVIEW

[论文解读] Progressive Checkerboards for Autoregressive Multiscale Image Generation

David Eigen|arXiv (Cornell University)|Feb 3, 2026

Generative Adversarial Networks and Image Synthesis被引用 0

一句话总结

One-sentence direct-answer summary in Chinese: 论文介绍了一种使用固定进展性棋盘排序的多尺度自回归采样器，在尺度内实现并行采样并在尺度之间进行条件绑定，同时在较少的采样步骤下获得与ImageNet 256×256相近的结果。

ABSTRACT

A key challenge in autoregressive image generation is to efficiently sample independent locations in parallel, while still modeling mutual dependencies with serial conditioning. Some recent works have addressed this by conditioning between scales in a multiscale pyramid. Others have looked at parallelizing samples in a single image using regular partitions or randomized orders. In this work we examine a flexible, fixed ordering based on progressive checkerboards for multiscale autoregressive image generation. Our ordering draws samples in parallel from evenly spaced regions at each scale, maintaining full balance in all levels of a quadtree subdivision at each step. This enables effective conditioning both between and within scales. Intriguingly, we find evidence that in our balanced setting, a wide range of scale-up factors lead to similar results, so long as the total number of serial steps is constant. On class-conditional ImageNet, our method achieves competitive performance compared to recent state-of-the-art autoregressive systems with like model capacity, using fewer sampling steps.

研究动机与目标

提出并演示一个多尺度自回归采样器，在尺度内并行采样位置，同时不丢失跨尺度的条件化能力。
提出一种固定的进展性棋盘排序，在四叉树细分中保持平衡，以控制并行性和条件化。
研究跨尺度与尺度内条件化如何相互作用，以及总采样步骤数如何影响性能。
展示在ImageNet 256×256下，使用比最新自回归模型更少的采样步骤仍能获得有竞争力的结果。

提出的方法

基于Transformer的自回归模型，带区块因果掩码和渐进棋盘采样区块。
从上一尺度上采样潜在码以对当前尺度进行条件化；将位置分成P个区块，逐块串行处理，同时对每个区块并行采样。
使用平衡的渐进棋盘排序（以TL、BR、TR、BL对角模式的分治法）确保四叉树层级中的空间平衡。
通过对所有尺度并行使用真实码进行训练；使用结合上采样的上一尺度潜在码与当前尺度输出的跨尺度输入，以及学习到的位置嵌入。
尝试RoPE混合以同时关注当前区块与前一区块的位置；在阶段性CFG计划中应用无分类器引导。

Figure 1: Progressive checkerboard samples from our model using 2x scale factor and 8 steps per scale. Masking applied to sampled locations at each step after decoding for visualization.

实验结果

研究问题

RQ1进展性棋盘排序在多尺度自回归生成中如何影响并行性与条件化？
RQ2跨尺度与尺度内条件化如何相互作用，总采样步骤数如何影响性能？
RQ3在ImageNet上，哪些尺度放大因子能在平衡条件化与并行性方面达到高质量的图像合成？
RQ4与最先进的自回归模型相比，是否可以用更少的采样步骤达到有竞争力的结果？

主要发现

模型	类型/标记	参数量	FID	IS	Pre.	Rec.	Steps	Time (s)
DiT-XL/2	Diffu-KL	675M	2.24	278.2	0.83	0.57	1×250	11.9
MAR-L	MAR-KL	479M	1.78	296.0	0.81	0.60	64×100	26.4
GtR	MAR-KL	479M	1.81	297.4	—	—	32×30	—
xAR	Flow-KL	608M	1.28	292.5	0.82	0.62	4×50	7.7
LlamaGen-L	AR-VQ	343M	3.07	256.1	0.83	0.52	576	12.58
VAR-d16	AR-VQ	310M	3.30	274.4	0.84	0.51	10	0.12
PAR-L-4x	AR-VQ	343M	3.76	218.9	0.84	0.50	147	3.38
RandAR-L	AR-VQ	343M	2.55	288.8	0.81	0.58	88	1.97
NAR-L	AR-VQ	372M	3.06	263.9	0.81	0.53	31	1.01
ARPG-L	AR-VQ	320M	2.30	297.7	0.82	0.56	32	0.58
LPD-L	AR-VQ	337M	2.40	284.5	0.81	0.57	20	0.28
Checkerboard-L 2x cfg=1.4	AR-VQ	343M	2.72	302.5	0.81	0.56	17	0.52
Checkerboard-L 2x cfg=1.5	AR-VQ	343M	2.83	318.2	0.82	0.57	17	0.52
Checkerboard-L 4x cfg=1.7	AR-VQ	343M	2.79	311.5	0.80	0.57	17	0.52

一种空间上平衡的渐进棋盘排序在每个尺度内实现并行采样，并在尺度之间保持条件化。
对于多尺度设置，总的顺序步骤数在很大程度上决定性能，当尺度因子为2、3、4且总步骤固定时，结果相近。
Checkerboard-L模型在2x和4x放大下以更少的步骤（共17步）实现有竞争力的FID/IS，并相比可比的AR-VQ方法拥有更快的推理时间。
跨尺度因子中，使用大约17步的总步骤数可获得接近最优的性能；超过该步骤数的增多收益递减。
RoPE混合未带来明显的性能提升，表明输入条件化足以让早期层提取必要的条件信息。
在ImageNet 256×256上，Checkerboard-L模型在17步时达到FID 2.72–2.83、IS约为302–318，优于若干PAR/RandAR基线在步数效率和速度上的表现。

Figure 2: Overview of our multiscale blockwise checkerboard autoregressor.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。