Skip to main content
QUICK REVIEW

[論文レビュー] The Transformer Network for the Traveling Salesman Problem

Xavier Bresson, Thomas Laurent|arXiv (Cornell University)|Mar 4, 2021
Natural Language Processing Techniques参考文献 36被引用数 50
ひとこと要約

The paper adapts Transformer architectures to solve the TSP via reinforcement learning, achieving competitive results and small optimal gaps on TSP50 and TSP100 with beam/decode strategies.

ABSTRACT

The Traveling Salesman Problem (TSP) is the most popular and most studied combinatorial problem, starting with von Neumann in 1951. It has driven the discovery of several optimization techniques such as cutting planes, branch-and-bound, local search, Lagrangian relaxation, and simulated annealing. The last five years have seen the emergence of promising techniques where (graph) neural networks have been capable to learn new combinatorial algorithms. The main question is whether deep learning can learn better heuristics from data, i.e. replacing human-engineered heuristics? This is appealing because developing algorithms to tackle efficiently NP-hard problems may require years of research, and many industry problems are combinatorial by nature. In this work, we propose to adapt the recent successful Transformer architecture originally developed for natural language processing to the combinatorial TSP. Training is done by reinforcement learning, hence without TSP training solutions, and decoding uses beam search. We report improved performances over recent learned heuristics with an optimal gap of 0.004% for TSP50 and 0.39% for TSP100.

研究の動機と目的

  • Motivate exploring neural transformers for combinatorial optimization, specifically TSP.
  • Develop a Transformer-encoder/auto-regressive decoder architecture tailored for TSP.
  • Train the model with reinforcement learning without requiring TSP solutions for supervision.
  • Evaluate decoding strategies (greedy, beam search) and compare to traditional solvers and learned heuristics.

提案手法

  • Cast TSP as a translation problem from city coordinates to an ordered tour.
  • Use a Transformer encoder with batch normalization to process city embeddings; decoder auto-regressively generates the tour with a start token.
  • In decoding, build a query-attention based step to select the next non-visited city using the encoded city representations.
  • Train with policy gradient reinforcement learning; use tour length as the reward and a baseline updated during training.
  • Decode with sampling and beam search; sample next city from a softmax distribution or select top beams for best tours.

実験結果

リサーチクエスチョン

  • RQ1Can a Transformer architecture learn effective TSP heuristics through reinforcement learning without supervised TSP solutions?
  • RQ2How does the Transformer-based TSP solver compare to classical solvers and prior learned heuristics on standard TSP instances (e.g., n=50,100)?
  • RQ3What decoding strategies (greedy, beam search, sampling) yield the best trade-offs between solution quality and inference time?
  • RQ4What architectural choices (BN vs LN, encoder/decoder design) affect performance for TSP as a sequence-generation task?

主な発見

MethodObjGapT TimeI Time
Concorde [3] (TSP50)5.6890.00%2m*0.05s
Gurobi [16] (TSP50)-0.00%*2m*-
Nearest insertion7.00*22.94%*0s*-
Farthest insertion [21]6.01*5.53%*2s*-
OR tools [15]5.80*1.83%*--
LKH-3 [19]-0.00%*5m*-
Joshi et al. [23] (TSP50)5.873.10%55s-
Kool et al. [26] (B=100)5.6920.04%2.3m0.09s
Our model (B=100)5.6920.04%2.3m0.09s
Our model (B=1000)5.6900.01%17.8m0.15s
Our model (B=2500)5.6890.004%44.8m0.33s
  • The Transformer-based solver improves over recent learned heuristics, achieving an optimal gap of 0.004% for TSP50 and 0.39% for TSP100 (as reported).
  • On TSP50, the model achieves an objective around 5.707 with a 0.31% gap in 13.7 seconds total (beam/decoding considered); on TSP100, objective around 7.875 with a 1.42% gap in 4.6 seconds.
  • The method uses a Transformer encoder with batch normalization and a specialized auto-regressive decoder that incorporates a start token and positional encodings for tour ordering.
  • Beam search and sampling strategies are discussed as yielding improvements over greedy decoding in prior work and are used to enhance results.
  • Compared to Concorde and other solvers, the neural approach offers favorable inference times with competitive gaps, while traditional solvers still deliver optimal or stronger solutions in some cases.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。