QUICK REVIEW

[论文解读] Variational Approach for Job Shop Scheduling

Seung Heon Oh, Jiwon Baek|arXiv (Cornell University)|Jan 30, 2026

Scheduling and Optimization Algorithms被引用 0

一句话总结

本论文提出 VG2S，一种变分图到调度器框架，通过变分推断和最大熵 RL 将表示学习与策略优化解耦，以解决 JSSP 并具备强零样本泛化能力。

ABSTRACT

This paper proposes a novel Variational Graph-to-Scheduler (VG2S) framework for solving the Job Shop Scheduling Problem (JSSP), a critical task in manufacturing that directly impacts operational efficiency and resource utilization. Conventional Deep Reinforcement Learning (DRL) approaches often face challenges such as non-stationarity during training and limited generalization to unseen problem instances because they optimize representation learning and policy execution simultaneously. To address these issues, we introduce variational inference to the JSSP domain for the first time and derive a probabilistic objective based on the Evidence of Lower Bound (ELBO) with maximum entropy reinforcement learning. By mathematically decoupling representation learning from policy optimization, the VG2S framework enables the agent to learn robust structural representations of scheduling instances through a variational graph encoder. This approach significantly enhances training stability and robustness against hyperparameter variations. Extensive experiments demonstrate that the proposed method exhibits superior zero-shot generalization compared with state-of-the-art DRL baselines and traditional dispatching rules, particularly on large-scale and challenging benchmark instances such as DMU and SWV.

研究动机与目标

在端到端 DRL 用于 JSSP 时的不稳定性与泛化差问题上给出动机与解决方案。
提出一个变分框架，将表示学习与策略优化解耦。
开发具有变分图编码器和基于序列的策略解码器的 VG2S。
在 DMU、SWV 等大规模基准上展示改进的零样本泛化。

提出的方法

将 JSSP 形式化为互斥图并定义工序的静态与动态特征。
引入变分图编码器，通过具有重构项和策略项的 ELBO 学习潜在表示。
采用两阶段训练流程：变分表示学习后进行带最大熵目标的策略学习。
实现基于图神经网络的编码器，具有异质边类型和 z 的变分潜在空间。
采用图到序列风格的策略解码器，使用窥视注意力机制选择调度动作。
通过针对节点和边的重构损失、潜在空间的 KL 发散，以及带熵正则化的策略梯度目标进行训练。

实验结果

研究问题

RQ1变分推断是否能在 JSSP 中提升表示学习的鲁棒性，相较于端到端的 DRL？
RQ2将表示学习与策略优化解耦是否能提升训练稳定性和对未见实例的泛化？
RQ3VG2S 在具备零样本泛化能力的大规模、具有挑战性的 JSSP 基准上表现如何？
RQ4变分潜在空间对实例拓扑聚类和调度性能有何影响？

主要发现

VG2S 在大基准上实现超越现有 DRL 基线和传统调度规则的零样本泛化。
变分编码器产生的潜在空间在进行策略训练前就已按拓扑对实例进行聚类。
将表示学习与策略优化解耦提高了训练稳定性以及对超参数变化的鲁棒性。
该方法结合 ELBO 和具有熵正则化的 RL，能够处理实例变异性与调度中的潜在随机性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。