QUICK REVIEW

[论文解读] Reinforcement Learning for Integer Programming: Learning to Cut

Yunhao Tang, Shipra Agrawal|arXiv (Cornell University)|Jun 11, 2019

Reinforcement Learning in Robotics参考文献 26被引用 53

一句话总结

该论文将Gomory方法中的切除平面选择建模为深度强化学习问题，并展示RL引导的割可以提升整数规划的性能，包括在分支与裁剪（Branch-and-Cut）中，在多种问题类别与规模上。

ABSTRACT

Integer programming (IP) is a general optimization framework widely applicable to a variety of unstructured and structured problems arising in, e.g., scheduling, production planning, and graph optimization. As IP models many provably hard to solve problems, modern IP solvers rely on many heuristics. These heuristics are usually human-designed, and naturally prone to suboptimality. The goal of this work is to show that the performance of those solvers can be greatly enhanced using reinforcement learning (RL). In particular, we investigate a specific methodology for solving IPs, known as the Cutting Plane Method. This method is employed as a subroutine by all modern IP solvers. We present a deep RL formulation, network architecture, and algorithms for intelligent adaptive selection of cutting planes (aka cuts). Across a wide range of IP tasks, we show that the trained RL agent significantly outperforms human-designed heuristics, and effectively generalizes to 10X larger instances and across IP problem classes. The trained agent is also demonstrated to benefit the popular downstream application of cutting plane methods in Branch-and-Cut algorithm, which is the backbone of state-of-the-art commercial IP solvers.

研究动机与目标

证明强化学习可以显著改善整数规划中Gomory割的选择。
开发一个高效的MDP（马尔可夫决策过程）表述和深度RL策略，以自适应地选择Gomory割。
评估在不同IP规模和问题类别上的泛化能力，并评估对Branch-and-Cut求解器的影响。
提供关于RL学习到的切割类型的见解，包括它们与用于打包问题的已知不等式的关系。

提出的方法

将切除平面选择形式化为一个马尔可夫决策过程，状态为LP约束、当前LP解和Gomory割；行动为候选Gomory割。
使用带注意力机制、对顺序与排列无关的深度RL策略，从候选割中对割进行评分与选择。
通过LSTM对变长约束进行嵌入，以处理不同规模的问题，并使用注意力机制来计算行动概率。
通过进化策略进行训练，以从在多个IP实例上的滚动数据中估计策略梯度。
使用整数性缺口闭合（IGC）进行评估，并与Random、Max Violation、Max Normalized Violation和Lexicographical基线进行比较。
在分支与裁剪（B&C）设置中进行测试，以评估RL割作为子程序的效果并衡量对节点扩展的影响。

实验结果

研究问题

RQ1RL引导的Gomory割是否能在各类IP中降低达到最优所需的割数？
RQ2与传统启发式方法相比，RL在多大程度上有效地闭合整数性缺口？
RQ3RL策略是否能跨实例规模和问题类别泛化，以及它们是否能提升Branch-and-Cut的效率？
RQ4学习到的割的性质如何，它们是否类似于在背包问题中已知的有效不等式（如提升的覆盖不等式）？

主要发现

在打包、规划、二进制打包和最大割等问题上，RL在达到最优所需的切割数量显著少于基线。
在较大实例上，RL显著改善整数性缺口闭合，即使单独的割可能无法达到最优。
在较小实例上训练的策略可以泛化到较大实例，甚至跨IP类别迁移，表现具有竞争力。
在Branch-and-Cut框架中，RL割减少了扩展的子问题数量并提高整体效率。
在背包问题上，RL学习到的割类似于提升的覆盖不等式，表明可解释、意义明确的切割策略。
在Branch-and-Cut中将RL驱动的割作为子程序使用时，提供显著改进，提升求解器性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。