QUICK REVIEW

[论文解读] Quantum Architecture Search via Deep Reinforcement Learning

En-Jui Kuo, Yao-Lung L. Fang|arXiv (Cornell University)|Apr 15, 2021

Quantum Computing Algorithms and Architecture参考文献 91被引用 37

一句话总结

一种深度强化学习框架，从头构建量子门序列以生成目标量子态，在 Bell 和 GHZ 状态上展示，使用 A2C 和 PPO。

ABSTRACT

Recent advances in quantum computing have drawn considerable attention to building realistic application for and using quantum computers. However, designing a suitable quantum circuit architecture requires expert knowledge. For example, it is non-trivial to design a quantum gate sequence for generating a particular quantum state with as fewer gates as possible. We propose a quantum architecture search framework with the power of deep reinforcement learning (DRL) to address this challenge. In the proposed framework, the DRL agent can only access the Pauli-$X$, $Y$, $Z$ expectation values and a predefined set of quantum operations for learning the target quantum state, and is optimized by the advantage actor-critic (A2C) and proximal policy optimization (PPO) algorithms. We demonstrate a successful generation of quantum gate sequences for multi-qubit GHZ states without encoding any knowledge of quantum physics in the agent. The design of our framework is rather general and can be employed with other DRL architectures or optimization methods to study gate synthesis and compilation for many quantum states.

研究动机与目标

激发在不需要大量物理知识的情况下自动设计量子电路架构。
构建一个 DRL 框架，让智能体逐步构造量子电路以达到目标态。
演示多量子比特纠缠态（Bell 和 GHZ）的门综合能力。
在无噪声和有噪声的量子仿真中评估性能。
探索该框架对其他 DRL 架构和量子态的通用性。

提出的方法

将量子架构搜索表述为一个强化学习问题，其中动作向电路追加量子门。
以生成态与目标态之间的保真度作为主要奖励信号。
将观测值提供为每个量子比特的帕利期望值以引导学习。
比较策略优化算法，包括优势演员-评论家（A2C）和近端策略优化（PPO）。
在模拟量子环境中用基于梯度的优化（Adam）训练策略。
使用定制的 OpenAI Gym 环境来管理状态、动作和奖励。

实验结果

研究问题

RQ1DRL 智能体是否能从零开始合成量子门序列以在容忍度内实现指定的目标态？
RQ2A2C 与 PPO 在量子门搜索任务中的收敛速度和稳定性方面有何比较？
RQ3噪声对两量子比特和三量子比特态的 DRL 驱动门综合有何影响？
RQ4框架能否扩展到更大规模的量子比特系统和更复杂的目标态？
RQ5在多大程度上该方法可以在智能体不嵌入量子物理知识的情况下运行？

主要发现

A2C 与 PPO 都能在无噪声环境中从零开始训练智能体合成 Bell 和 GHZ 状态。
在两量子比特和三量子比特任务中，PPO 的收敛更快、稳定性更高，胜过 A2C。
在有噪声的环境中，PPO 仍然有效用于 Bell 状态合成，保真度受噪声水平影响但仍然收敛。
动作集的规模与量子比特数量成二次方增长，因此可以进行多量子比特门的综合，而不会呈指数增长。
框架无需将详细量子物理知识编码到智能体中，通过基于态保真度的奖励学习门序列。
作者提供了一个可重复使用的基于 DRL 的量子电路设计与分析环境。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。