QUICK REVIEW

[论文解读] Solving NP-Hard Problems on Graphs by Reinforcement Learning without Domain Knowledge.

Kenshin Abe, Zijian Xu|arXiv (Cornell University)|May 28, 2019

Advanced Graph Neural Networks参考文献 2被引用 21

一句话总结

该论文提出了一种无需领域知识的强化学习框架，用于求解NP难图问题，其灵感源自AlphaGo Zero的自对弈与树搜索。通过将深度Q学习适配至连续奖励，并与图同构网络（Graph Isomorphism Networks）结合，该方法在五类NP难问题上实现了卓越的泛化性能，解决方案质量与适应性均优于S2V-DQN。

ABSTRACT

There have been increasing challenges to solve combinatorial optimization problems by machine learning. Khalil et al. proposed an end-to-end reinforcement learning framework, S2V-DQN, which automatically learns graph embeddings to construct solutions to a wide range of problems. To improve the generalization ability of their Q-learning method, we propose a novel learning strategy based on AlphaGo Zero which is a Go engine that achieved a superhuman level without the domain knowledge of the game. Our framework is redesigned for combinatorial problems, where the final reward might take any real number instead of a binary response, win/lose. In experiments conducted for five kinds of NP-hard problems including {\sc MinimumVertexCover} and {\sc MaxCut}, our method is shown to generalize better to various graphs than S2V-DQN. Furthermore, our method can be combined with recently-developed graph neural network (GNN) models such as the \emph{Graph Isomorphism Network}, resulting in even better performance. This experiment also gives an interesting insight into a suitable choice of GNN models for each task.

研究动机与目标

解决利用机器学习求解多样化NP难组合优化图问题的挑战，且不依赖手工设计的特征或领域特定规则。
通过将AlphaGo Zero的自对弈与蒙特卡洛树搜索适配至连续奖励设置，提升在不同图结构间的泛化能力。
将所提方法与现代图神经网络（如图同构网络）集成，以增强表征学习能力与解决方案质量。
探究不同图神经网络架构对各类优化任务性能的影响。

提出的方法

通过将二元奖励替换为反映解决方案质量的连续实值奖励，将AlphaGo Zero框架适配至组合优化任务。
采用深度Q学习智能体，通过神经网络近似价值函数，迭代选择图节点或边以构建解决方案。
采用自对弈训练结合蒙特卡洛树搜索，引导探索与策略优化，无需人类示范或领域特定的奖励设计。
将图同构网络（GIN）作为骨干神经网络，以学习具有表达力且置换等变的图表示。
通过策略梯度更新端到端训练智能体，优化解决方案构建过程中的累积奖励。
通过课程学习策略，在训练过程中逐步增加图的复杂度，以提升收敛性与泛化能力。

实验结果

研究问题

RQ1一种在无领域知识条件下训练的强化学习框架，能否在多样化NP难图问题上实现良好泛化？
RQ2将AlphaGo Zero的自对弈与树搜索方法适配至连续奖励设置，对组合优化中的解决方案质量有何影响？
RQ3使用图同构网络对所提方法的性能与泛化能力有何影响？
RQ4不同图神经网络架构的选择如何影响各类优化任务的解决方案质量？

主要发现

所提方法在五类NP难问题（包括最小点覆盖与最大割问题）上，跨多样化图类型的表现显著优于S2V-DQN，泛化能力更强。
将图同构网络与强化学习框架结合后，解决方案质量优于其他GNN变体。
在所有测试问题上，该方法在未使用任何领域特定奖励设计或人工特征的情况下，仍实现了具有竞争力或更优的解决方案质量。
消融实验验证了自对弈与连续奖励学习相结合，可有效提升泛化能力与收敛速度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。