[논문 리뷰] Construct, Merge, Solve & Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem
RL-CMSA는 학습된 쌍 도시 상호작용으로 안내되는 구성적 탐색과 정확한 집합 커버 MILP 및 국소 개선을 결합하여 단일 창고 최소-최대 mTSP를 다루며, 비슷한 시간 제한하에서 유사한 하이브리드 GA 기준보다 우수한 성능을 보입니다.
The Multiple Traveling Salesman Problem (mTSP) extends the Traveling Salesman Problem to m tours that start and end at a common depot and jointly visit all customers exactly once. In the min-max variant, the objective is to minimize the longest tour, reflecting workload balance. We propose a hybrid approach, Construct, Merge, Solve & Adapt with Reinforcement Learning (RL-CMSA), for the symmetric single-depot min-max mTSP. The method iteratively constructs diverse solutions using probabilistic clustering guided by learned pairwise q-values, merges routes into a compact pool, solves a restricted set-covering MILP, and refines solutions via inter-route remove, shift, and swap moves. The q-values are updated by reinforcing city-pair co-occurrences in high-quality solutions, while the pool is adapted through ageing and pruning. This combination of exact optimization and reinforcement-guided construction balances exploration and exploitation. Computational results on random and TSPLIB instances show that RL-CMSA consistently finds (near-)best solutions and outperforms a state-of-the-art hybrid genetic algorithm under comparable time limits, especially as instance size and the number of salesmen increase.
연구 동기 및 목표
- Address the single-depot min–max mTSP where the goal is to minimize the longest tour among m tours.
- Develop a hybrid framework that integrates constructive solution generation, exact optimization, and reinforcement learning for adaptation.
- Balance exploration and exploitation through a learned q-value guidance and an age-based pool adaptation.
- Evaluate RL-CMSA against state-of-the-art hybrids on random and TSPLIB instances under comparable time limits.
제안 방법
- Iteratively construct diverse solutions via probabilistic clustering guided by learned pairwise q-values.
- Merge constructed routes into a candidate pool and prune dominated routes by canonical signatures and length.
- Solve a restricted set-covering MILP to select m routes that cover all customers while minimizing the maximum route length.
- Improve the resulting solution with inter-route remove, shift, and swap moves to reduce the longest route.
- Learn city-pair co-occurrence counts to reinforce beneficial city pairings and discourage unhelpful ones, updating q-values accordingly.
- Adapt the candidate pool through an age-based policy to maintain a compact, up-to-date set of routes.
실험 결과
연구 질문
- RQ1Can reinforcement-guided constructive construction and adaptive pool management produce high-quality min–max mTSP solutions within time limits?
- RQ2Does combining a set-covering MILP with targeted local search yield robust improvements over strong hybrid heuristics for min–max mTSP?
- RQ3How effective are learned city-pair co-occurrences (q-values) at guiding clustering toward balanced, high-quality solutions?
- RQ4What is the impact of ageing and pruning on maintaining diversity and convergence in the RL-CMSA framework?
주요 결과
- RL-CMSA consistently finds near-best solutions on both random and TSPLIB instances.
- RL-CMSA outperforms a state-of-the-art Hybrid Genetic Algorithm (HGA) under comparable time limits, with stronger performance as instance size and number of salesmen grow.
- A large, diverse candidate pool plus high exploitation rates near the incumbent solution enhances performance for certain m values.
- Learned q-values stabilize around informative city-pair co-occurrences, guiding clustering toward balanced partitions.
- Age-based adaptation keeps the candidate pool compact and updated, contributing to scalable performance.
더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.