Skip to main content
QUICK REVIEW

[论文解读] The pros and cons of using deep reinforcement learning or genetic algorithms to design control schemes for quantum state transfer on qubit chains

Sofía Perón Santana, Ariel Fiuri|arXiv (Cornell University)|Jan 9, 2026
Quantum Information and Cryptography被引用 0
一句话总结

论文比较遗传算法(GA)与深度强化学习(DRL)在设计外部控制以在量子比特链上传输量子态中的应用,发现 GA 能在高保真度下实现快速传输,而 DRL 在噪声鲁棒性方面有优势,但对于更长的链条表现较差且计算成本较高。

ABSTRACT

In recent years, control methods based on different optimization techniques have shed light on the possibilities of processing information in many quantum systems. When exploring the transmission of quantum states, faster transmission times are mandatory to avoid the deleterious effects of multiple sources of decoherence that spoil the transmission process. In particular, using Reinforcement Learning to devise sequences of step-wise external controls provides good transfer policies at short transmission times. We present two approaches to control the transmission of quantum states in qubit chains using external controls to force the dynamical evolution of the chain state. The first approach relies on the well-known Genetic Algorithm to generate a sequence of external controls, while the second approach uses a variant of Reinforcement Learning. The Genetic algorithm achieves excellent transmission fidelity at as short transmission times as Reinforcement Learning, surpassing the fidelities achieved by the latter method. Nevertheless, the Reinforcement Learning method offers robust control policies when the control pulses are noisy enough, owing to an imperfect timing of the pulses, deficient control devices, or other sources of phase decoherence. We present the regime where each method is best suited to control the transmission of arbitrary qubit states.

研究动机与目标

  • Motivate the use of optimization-based control for quantum state transfer in qubit chains to mitigate decoherence.
  • Compare GA and DRL approaches for generating sequences of external controls that drive state transfer.
  • Characterize performance under fluctuations and determine regimes where each method excels.
  • Provide guidance on when to prefer GA vs. DRL for fast and robust quantum state transfer.

提出的方法

  • Model the qubit chain with the XX Hamiltonian and piecewise-constant external fields h_i(t) acting as controls.
  • Represent control sequences as chromosomes (GA) or as actions in a Deep Q-network (DRL) within an MDP framework.
  • Evaluate GA by evolving populations of control sequences using fitness = maximum transmission probability over a time window.
  • Implement DRL via a Deep Q-Network with a Q-network, target network, and replay memory to learn action-value estimates.
  • Compare action-by-site controls to a fixed-sets-of-actions scheme and analyze performance versus chain length.
  • Test robustness to fluctuations by training and validating DRL models under noisy or imperfect-control conditions.
Figure 1: The cartoon in the figure depicts a system of $N$ qubits and its time evolution. The initial state, shown at the leftmost extreme of the cartoon, corresponds to a one-excitation quantum state. The step-wise evolution operator for a given interval, $U_{k}=U(\tau_{k})$ , acts over all the qu
Figure 1: The cartoon in the figure depicts a system of $N$ qubits and its time evolution. The initial state, shown at the leftmost extreme of the cartoon, corresponds to a one-excitation quantum state. The step-wise evolution operator for a given interval, $U_{k}=U(\tau_{k})$ , acts over all the qu

实验结果

研究问题

  • RQ1GA 推导的控制序列在同质量子比特链上实现高保真量子态传输方面,与 DRL 推导的序列相比有何差异?
  • RQ2在短传输时间与长传输时间以及在有噪声/控制不完美条件下,哪种方法表现更好?
  • RQ3在链长与控制参数的哪些量纲上,GA 比 DRL 更有优势,或相反?
  • RQ4学习到的控制策略在开放量子系统与控制硬件波动下的鲁棒性如何?
  • RQ5在本问题上,GA 与 DRL 的计算成本权衡如何?

主要发现

  • GA 能在较短传输时间实现出色的传输保真度,通常与 DRL 的表现相当甚至超越。
  • 逐点控制的 GA 控制在保真度和鲁棒性方面均优于 Zhang 等的动作集,在不同链长下均有优势。
  • DRL(DQN)在较长链上难以产生高质量的态传输,尽管在较短链上某些实例接近量子极限速度。
  • 在具有波动环境下训练的 DRL 策略表现出对噪声的鲁棒性,但不同训练轮次之间可能存在变异,且计算量显著。
  • 对于较长链,GA 提供更快的收敛和更可靠的高保真传输,而 DRL 可能在鲁棒性方面有所提升,但代价是计算成本与一致性。
  • 当存在波动时,DRL 训练的策略可以维持性能,而 GA 序列在未将波动纳入训练时可能下降。
Figure 2: The cartoon in the Figure presents the main ingredients of the Genetic Algorithm. a) The sixteen possible actions, each of which can appear on a control sequence at any position in it. b) An initial population of four individuals, each one endowed with its own chromosome. The chromosome co
Figure 2: The cartoon in the Figure presents the main ingredients of the Genetic Algorithm. a) The sixteen possible actions, each of which can appear on a control sequence at any position in it. b) An initial population of four individuals, each one endowed with its own chromosome. The chromosome co

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。