QUICK REVIEW

[论文解读] Quantum circuit optimization with deep reinforcement learning

Thomas Fösel, Murphy Yuezhen Niu|arXiv (Cornell University)|Mar 13, 2021

Quantum Computing Algorithms and Architecture参考文献 43被引用 77

一句话总结

作者开发了一种用于量子电路优化的深度强化学习方法，考虑硬件特性，在12-qubit随机电路上实现显著的深度和门数量缩减，并对更大规模电路显示出可推广性。

ABSTRACT

A central aspect for operating future quantum computers is quantum circuit optimization, i.e., the search for efficient realizations of quantum algorithms given the device capabilities. In recent years, powerful approaches have been developed which focus on optimizing the high-level circuit structure. However, these approaches do not consider and thus cannot optimize for the hardware details of the quantum architecture, which is especially important for near-term devices. To address this point, we present an approach to quantum circuit optimization based on reinforcement learning. We demonstrate how an agent, realized by a deep convolutional neural network, can autonomously learn generic strategies to optimize arbitrary circuits on a specific architecture, where the optimization target can be chosen freely by the user. We demonstrate the feasibility of this approach by training agents on 12-qubit random circuits, where we find on average a depth reduction by 27% and a gate count reduction by 15%. We examine the extrapolation to larger circuits than used for training, and envision how this approach can be utilized for near-term quantum devices.

研究动机与目标

Motivate quantum circuit optimization (QCO) for NISQ devices with hardware-aware constraints.
Propose a reinforcement learning (RL) framework to autonomously learn QCO strategies.
Enable optimization of arbitrary circuits on a given architecture with a user-defined objective.
Demonstrate the approach on 12-qubit random circuits and explore extrapolation to larger circuits.

提出的方法

Represent circuits as diagrams and formulate QCO as an RL problem where states are circuits and actions are equivalence-preserving transformations.
Use hard (always-beneficial) and soft (context-dependent) transformation rules; pruning applies all hard transformations after the agent selects a soft one.
Employ a deep convolutional network (DCNN) for the agent to map circuit observations to a policy and a value function, enabling Proximal Policy Optimization (PPO) with an AAC framework.
Define the reward via a circuit-desirable property q(s) that correlates with the probability of circuit success, using r_t = -(q(s_{t+1}) - q(s_t)).
Adopt a 3D-convolutional observation representation over qubit index, moment, and gate class, with a structured mapping from transformations to policy outputs to keep the action space tractable.

实验结果

研究问题

RQ1Can a deep RL agent learn hardware-aware circuit transformations that reduce depth and gate counts while preserving logical equivalence?
RQ2How well does the trained agent generalize to larger circuits beyond its training size?
RQ3What is the impact of the chosen reward function on learning efficiency and optimization quality?
RQ4How does RL compare to simulated annealing on random expanded circuits under the same hardware model?
RQ5Can the approach handle varying gate sets and connectivities pertinent to near-term devices?

主要发现

On 12-qubit random circuits, the agent achieved average depth reduction of 27% and gate-count reduction of 15%.
Training consisted of two phases, reaching mean depth d ≈ 27.20 and mean gate count n ≈ 97.86 at around epoch 1000, outperforming pruning and simulated annealing.
The trained agent generalizes to larger circuits; on 50-qubit random circuits, starting from pruned circuits, it reduces depth to 110.84 and gate count to 1616.3 within 2500 transformations, comparable to large-step simulated annealing results.
Compared to simulated annealing on the same dataset, the RL agent achieves better or comparable optimization with far fewer steps, and requires substantial time to train (6–7 days on 32 CPUs).
For QAOA-MaxCut circuits, a generic agent found improvements (e.g., d from 75 to 68, n from 142 to 138) and a specialized agent achieved d=66, n=138.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。