QUICK REVIEW

[论文解读] Scaling Quantum Machine Learning without Tricks: High-Resolution and Diverse Image Generation

Jonas Jäger, Florian J. Kiwit|arXiv (Cornell University)|Feb 27, 2026

Quantum Computing Algorithms and Architecture被引用 0

一句话总结

论文在端到端的量子Wasserstein GAN上训练，生成全分辨率、多样性的 MNIST、Fashion-MNIST 和 SVHN 图像，无需降维或补丁化，通过任务特定的量子电路设计与多模态噪声，在阴影噪声下实现高质量结果。

ABSTRACT

Quantum generative modeling is a rapidly evolving discipline at the intersection of quantum computing and machine learning. Contemporary quantum machine learning is generally limited to toy examples or heavily restricted datasets with few elements. This is not only due to the current limitations of available quantum hardware but also due to the absence of inductive biases arising from application-agnostic designs. Current quantum solutions must resort to tricks to scale down high-resolution images, such as relying heavily on dimensionality reduction or utilizing multiple quantum models for low-resolution image patches. Building on recent developments in classical image loading to quantum computers, we circumvent these limitations and train quantum Wasserstein GANs on the established classical MNIST and Fashion-MNIST datasets. Using the complete datasets, our system generates full-resolution images across all ten classes and establishes a new state-of-the-art performance with a single end-to-end quantum generator without tricks. As a proof-of-principle, we also demonstrate that our approach can be extended to color images, exemplified on the Street View House Numbers dataset. We analyze how the choice of variational circuit architecture introduces inductive biases, which crucially unlock this performance. Furthermore, enhanced noise input techniques enable highly diverse image generation while maintaining quality. Finally, we show promising results even under quantum shot noise conditions.

研究动机与目标

在不使用补丁化或降维等技巧的情况下，展示在标准基准上端到端的量子图像生成的全分辨率能力。
展示任务特定的量子电路设计（归纳偏置）如何实现可扩展、多样化、高质量的图像生成。
研究多模态噪声输入与拍噪对性能与多样性的影响。
提供实证证据，表明与任务对齐的电路结构比通用、与任务无关的设计更具优势。

提出的方法

在Wasserstein-GAN框架（WGAN-GP）中使用带量子生成器的量子GAN和经典判别器。
使用与 FRQI 相关的表示对图像进行编码，以实现全尺寸图像生成而不降低维度。
引入多模态、可学习的噪声输入以创建多样化生成并避免模式崩溃。
设计针对 FRQI 编码的任务特定量子电路态相关结构，结合分层噪声上传、量子比特间的纠缠与颜色量子比特的旋转。
将量子态解码为图像，并训练经典判别器通过Wasserstein损失提供梯度信号。
在 MNIST、Fashion-MNIST、以及 SVHN（彩色）上评估以评估质量（FID）和多样性，同时考虑拍噪。

Figure 1 : Overview of the proposed QGAN generator and training workflow for a $4\times 4$ -pixel grayscale image. (1) Noise Sampling: a multimodal latent distribution is formed by uniformly sampling a discrete mode index $m\in\{1,2\}$ and drawing Gaussian noise $\varepsilon_{a}\sim\mathcal{N}(0,1)$

实验结果

研究问题

RQ1端到端的量子生成器是否能在不使用降维或基于补丁的方法的情况下，在标准基准上生成高质量的全分辨率图像？
RQ2任务特定的量子电路设计和 FRQI 类编码是否提供归纳偏置，从而实现可扩展的量子图像生成？
RQ3多模态噪声输入和拍噪条件如何影响量子生成模型的图像质量与多样性？
RQ4与之前基于补丁的或无特定任务的QGAN方法相比，结构选择在性能上有多大影响？

主要发现

大规模QGAN（64层，40个噪声模态）能生成所有十类 MNIST 与 Fashion-MNIST 图像，具有高视觉质量和丰富的类内多样性（FID：MNIST 118，Fashion-MNIST 91，SVHN 84）。
任务特定的生成器设计与 FRQI 编码显著优于无任务偏置和基于振幅的配置，图像更清晰、边缘更明确、饱和度更均衡。
具备可学习调控的多模态噪声可提高类内变化并降低模式混合，优于单模态和固定多模态设置（如消融分析中的 FID 提升）。
过度模态化（每个类别多个噪声模态）增强了类内多样性，并能揭示更细的子类（例如靴子与连衣裙显示出不同的模态）。
在有限拍噪下训练有助于保留像素信息，并在地址量子比特上得到更稳健、均匀分布的边缘概率，促进在硬件上的可扩展性。

Figure 2 : Illustration of multimodal noise modeling (left to right). Quantum circuit perspective of implementing a bimodal mixture distribution via controlled rotations sampling the classical bit $m$ uniformly and $\varepsilon$ normally (unimodal). $z_{0}$ and $z_{1}$ denote the tuned noise (shifte

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。