QUICK REVIEW

[논문 리뷰] The pros and cons of using deep reinforcement learning or genetic algorithms to design control schemes for quantum state transfer on qubit chains

Sofía Perón Santana, Ariel Fiuri|arXiv (Cornell University)|2026. 01. 09.

Quantum Information and Cryptography인용 수 0

한 줄 요약

논문은 양자 상태를 큐비트 체인에서 전이시키기 위한 외부 제어를 설계하기 위해 유전 알고리즘(GA)과 딥 강화학습(DRL)을 비교하고, GA가 고충실도 빠른 전이를 달성할 수 있는 반면 DRL은 노이즈에 대한 강건성을 제공하지만 더 긴 체인에서는 어려움을 겪고 계산 비용이 많이 들 수 있다.

ABSTRACT

In recent years, control methods based on different optimization techniques have shed light on the possibilities of processing information in many quantum systems. When exploring the transmission of quantum states, faster transmission times are mandatory to avoid the deleterious effects of multiple sources of decoherence that spoil the transmission process. In particular, using Reinforcement Learning to devise sequences of step-wise external controls provides good transfer policies at short transmission times. We present two approaches to control the transmission of quantum states in qubit chains using external controls to force the dynamical evolution of the chain state. The first approach relies on the well-known Genetic Algorithm to generate a sequence of external controls, while the second approach uses a variant of Reinforcement Learning. The Genetic algorithm achieves excellent transmission fidelity at as short transmission times as Reinforcement Learning, surpassing the fidelities achieved by the latter method. Nevertheless, the Reinforcement Learning method offers robust control policies when the control pulses are noisy enough, owing to an imperfect timing of the pulses, deficient control devices, or other sources of phase decoherence. We present the regime where each method is best suited to control the transmission of arbitrary qubit states.

연구 동기 및 목표

양자 체인에서의 양자 상태 전이가 가능한 Decoherence를 완화하기 위한 최적화 기반 제어의 사용을 동기화한다.
GA와 DRL 접근법을 비교하여 상태 전이를 주도하는 외부 제어 시퀀스를 생성한다.
변동성 하에서의 성능을 특징지우고 각 방법이 탁월한 영역을 결정한다.
빠르고 강건한 양자 상태 전성을 위해 언제 GA를 DRL보다 선호할지에 대한 지침을 제공한다.

제안 방법

XX 해밀토니언으로 큐비트 체인을 모델링하고 제어로 작용하는 피스와이스-상수 외부 필드 h_i(t)로 작용하는 제어를 사용한다.
제어 시퀀스를 염색체(GA)로 표현하거나 MDP 프레임워크 내의 DRL에서 Deep Q-network의 행동으로 표현한다.
피트니스 = 시간 창에서의 최대 전송 확률을 이용해 제어 시퀀스를 진화시키며 평가한다.
DRL은 Q-네트워크, 타깃 네트워크, 재생 메모리를 갖춘 Deep Q-Network를 통해 행동-값 추정을 학습한다.
사이트별 제어와 고정된 행동 집합 방식 간의 비교를 수행하고 체인 길이에 따른 성능을 분석한다.
노이즈가 있는 불완전 제어 조건에서 DRL 모델을 학습하고 검증하여 변동성에 대한 강건성을 테스트한다.

Figure 1: The cartoon in the figure depicts a system of $N$ qubits and its time evolution. The initial state, shown at the leftmost extreme of the cartoon, corresponds to a one-excitation quantum state. The step-wise evolution operator for a given interval, $U_{k}=U(\tau_{k})$ , acts over all the qu

실험 결과

연구 질문

RQ1동질한 큐비트 체인에서 GA 유도 제어 시퀀스가 DRL 유도 시퀀스와 비교하여 고충실도 양자 상태 전이를 달성하는가?
RQ2짧은 전송 시간과 긴 전송 시간에서 어떤 방법이 노이즈/불완전 제어 조건에서 더 나은 성능을 보이는가?
RQ3어떤 체인 길이 및 제어 매개변수의 영역에서 GA가 DRL보다 우세하거나 그 반대가 되는가?
RQ4오픈 양자 시스템 및 제어 하드웨어의 변동성에 학습된 제어 정책은 얼마나 강건한가?
RQ5이 문제에 대한 GA와 DRL 간의 계산 비용의 트레이드오프는 무엇인가?

주요 결과

GA는 짧은 전송 시간에 우수한 전송 충실도를 달성할 수 있으며 DRL 성능에 부합하거나 이를 능가하는 경우가 많다.
사이트별 GA 제어가 Zhang et al.의 행동 셋에 비해 충실도와 로버스트니스 면에서 체인 길이 전반에 걸쳐 우수하다.
DRL(DQN)은 긴 체인에서 고품질의 상태 전이를 생성하기 어렵지만 짧은 체인에 대해서는 양자 속도 한계에 근접하는 사례도 있다.
DRL 정책은 변동 환경에서 학습될 때 노이즈에 대해 강건한 모습을 보이나 학습 실행 간 가변성이 있으며 상당한 계산이 필요하다.|
긴 체인에 대해 GA는 더 빠른 수렴과 더 높은 충실도 전송의 신뢰성을 제공하는 반면 DRL은 계산 비용과 일관성의 대가로 강건성을 제공할 수 있다.
변동이 있을 때 DRL로 학습된 정책은 성능을 유지할 수 있지만 GA 시퀀스는 훈련에 변동을 포함하지 않으면 저하될 수 있다.

Figure 2: The cartoon in the Figure presents the main ingredients of the Genetic Algorithm. a) The sixteen possible actions, each of which can appear on a control sequence at any position in it. b) An initial population of four individuals, each one endowed with its own chromosome. The chromosome co

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.