QUICK REVIEW

[论文解读] Learning to Perform Local Rewriting for Combinatorial Optimization

Xinyun Chen, Yuandong Tian|arXiv (Cornell University)|Sep 30, 2018

Constraint Satisfaction and Optimization参考文献 54被引用 155

一句话总结

NeuRewriter 学会了一种策略，通过迭代重写当前解的局部部分，在不从头开始求解的情况下改善组合优化任务。它在表达式简化、在线作业调度和车辆路径规划等领域超越了强基线。

ABSTRACT

Search-based methods for hard combinatorial optimization are often guided by heuristics. Tuning heuristics in various conditions and situations is often time-consuming. In this paper, we propose NeuRewriter that learns a policy to pick heuristics and rewrite the local components of the current solution to iteratively improve it until convergence. The policy factorizes into a region-picking and a rule-picking component, each parameterized by a neural network trained with actor-critic methods in reinforcement learning. NeuRewriter captures the general structure of combinatorial problems and shows strong performance in three versatile tasks: expression simplification, online job scheduling and vehicle routing problems. NeuRewriter outperforms the expression simplification component in Z3; outperforms DeepRM and Google OR-tools in online job scheduling; and outperforms recent neural baselines and Google OR-tools in vehicle routing problems.

研究动机与目标

通过学习一个面向策略的局部重写框架，推动减少手工启发式参数调优。
开发 NeuRewriter，使其通过区域-和规则基础的重写对给定解进行迭代改进。
展示该方法在多个领域中的可迁移性和鲁棒性。

提出的方法

两部分策略：区域选择以选取解的一个区域，规则选择以决定一个重写动作。
通过 actor-critic 强化学习进行策略训练，使用 Q-function 作为区域打分器。
奖励 r = c(s_t) - c(s_{t+1}) 以鼓励累计改进。
神经网络对区域选择的 Q 和规则选择策略进行参数化，基于领域特定的状态表示。
领域包括表达式简化（Halide parse trees）、在线作业调度（dependency graphs）和车辆路径规划（routes）。
一个统一的重写流水线将所选规则应用于所选区域以获得下一个状态，并重复直到收敛。

实验结果

研究问题

RQ1学习到的局部重写策略是否能够在多种离散优化问题上超越人工调优的启发式方法和完整解的神经预测？
RQ2区域选择+规则选择的分解是否在不同问题域和分布中具有泛化性？
RQ3与传统求解器和神经基线相比，NeuRewriter 在解的质量和运行时间方面的表现如何？
RQ4消融研究揭示区域选择与重写规则之贡献的程度？

主要发现

在表达式简化领域，NeuRewriter 将表达式长度和解析树大小平均分别降低约 52% 和 59%。
在报告的实验中，它优于 Z3-simplify、Halide-rule 和启发式搜索，并且在 Z3-ctx-solver-simplify 的速度方面更快。
在在线作业调度中，NeuRewriter 胜过 Google OR-tools 和 DeepRM，尤其是在具有异构资源的更复杂设置中。
对于车辆路径规划，NeuRewriter 超越了最近的神经基线和 OR-tools，接近带有 20 个节点的 VRP 的离线最优界。
消融实验表明该方法对分布转移具有鲁棒性，且可以推广到更长的表达式和不同工作负载配置。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。