QUICK REVIEW

[논문 리뷰] Construct, Merge, Solve & Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem

Guillem Rodríguez Corominas, Maria Jose Blesa|arXiv (Cornell University)|2026. 02. 27.

Vehicle Routing Optimization Methods인용 수 0

한 줄 요약

RL-CMSA는 학습된 쌍 도시 상호작용으로 안내되는 구성적 탐색과 정확한 집합 커버 MILP 및 국소 개선을 결합하여 단일 창고 최소-최대 mTSP를 다루며, 비슷한 시간 제한하에서 유사한 하이브리드 GA 기준보다 우수한 성능을 보입니다.

ABSTRACT

The Multiple Traveling Salesman Problem (mTSP) extends the Traveling Salesman Problem to m tours that start and end at a common depot and jointly visit all customers exactly once. In the min-max variant, the objective is to minimize the longest tour, reflecting workload balance. We propose a hybrid approach, Construct, Merge, Solve & Adapt with Reinforcement Learning (RL-CMSA), for the symmetric single-depot min-max mTSP. The method iteratively constructs diverse solutions using probabilistic clustering guided by learned pairwise q-values, merges routes into a compact pool, solves a restricted set-covering MILP, and refines solutions via inter-route remove, shift, and swap moves. The q-values are updated by reinforcing city-pair co-occurrences in high-quality solutions, while the pool is adapted through ageing and pruning. This combination of exact optimization and reinforcement-guided construction balances exploration and exploitation. Computational results on random and TSPLIB instances show that RL-CMSA consistently finds (near-)best solutions and outperforms a state-of-the-art hybrid genetic algorithm under comparable time limits, especially as instance size and the number of salesmen increase.

연구 동기 및 목표

Address the single-depot min–max mTSP where the goal is to minimize the longest tour among m tours.
Develop a hybrid framework that integrates constructive solution generation, exact optimization, and reinforcement learning for adaptation.
Balance exploration and exploitation through a learned q-value guidance and an age-based pool adaptation.
Evaluate RL-CMSA against state-of-the-art hybrids on random and TSPLIB instances under comparable time limits.

제안 방법

Iteratively construct diverse solutions via probabilistic clustering guided by learned pairwise q-values.
Merge constructed routes into a candidate pool and prune dominated routes by canonical signatures and length.
Solve a restricted set-covering MILP to select m routes that cover all customers while minimizing the maximum route length.
Improve the resulting solution with inter-route remove, shift, and swap moves to reduce the longest route.
Learn city-pair co-occurrence counts to reinforce beneficial city pairings and discourage unhelpful ones, updating q-values accordingly.
Adapt the candidate pool through an age-based policy to maintain a compact, up-to-date set of routes.

실험 결과

연구 질문

RQ1Can reinforcement-guided constructive construction and adaptive pool management produce high-quality min–max mTSP solutions within time limits?
RQ2Does combining a set-covering MILP with targeted local search yield robust improvements over strong hybrid heuristics for min–max mTSP?
RQ3How effective are learned city-pair co-occurrences (q-values) at guiding clustering toward balanced, high-quality solutions?
RQ4What is the impact of ageing and pruning on maintaining diversity and convergence in the RL-CMSA framework?

주요 결과

RL-CMSA consistently finds near-best solutions on both random and TSPLIB instances.
RL-CMSA outperforms a state-of-the-art Hybrid Genetic Algorithm (HGA) under comparable time limits, with stronger performance as instance size and number of salesmen grow.
A large, diverse candidate pool plus high exploitation rates near the incumbent solution enhances performance for certain m values.
Learned q-values stabilize around informative city-pair co-occurrences, guiding clustering toward balanced partitions.
Age-based adaptation keeps the candidate pool compact and updated, contributing to scalable performance.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.

[논문 리뷰] Construct, Merge, Solve &amp; Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem