QUICK REVIEW

[论文解读] Solving a New 3D Bin Packing Problem with Deep Reinforcement Learning Method

Haoyuan Hu, Xiaodong Zhang|arXiv (Cornell University)|Aug 20, 2017

Optimization and Packing Problems参考文献 6被引用 100

一句话总结

提出一种新的三维箱子装载问题，目标是最小化箱体表面积，并展示基于 Pointer Network 的强化学习方法在真实数据上优于启发式方法约5%，且束搜索提升了结果。

ABSTRACT

In this paper, a new type of 3D bin packing problem (BPP) is proposed, in which a number of cuboid-shaped items must be put into a bin one by one orthogonally. The objective is to find a way to place these items that can minimize the surface area of the bin. This problem is based on the fact that there is no fixed-sized bin in many real business scenarios and the cost of a bin is proportional to its surface area. Our research shows that this problem is NP-hard. Based on previous research on 3D BPP, the surface area is determined by the sequence, spatial locations and orientations of items. Among these factors, the sequence of items plays a key role in minimizing the surface area. Inspired by recent achievements of deep reinforcement learning (DRL) techniques, especially Pointer Network, on combinatorial optimization problems such as TSP, a DRL-based method is applied to optimize the sequence of items to be packed into the bin. Numerical results show that the method proposed in this paper achieve about 5% improvement than heuristic method.

研究动机与目标

通过解决箱子尺寸不固定且箱子成本随表面积变化的实际包装问题来激发本研究。
定义一个新的 NP-hard 的三维箱装问题变体，聚焦于最小化能够容纳所有物品的箱体表面积。
开发一个基于强化学习的方法，受 Pointer Networks 启发，用于优化装载序列并与启发式方法进行比较。
在包含8、10或12个物品的真实数据订单上展示经验性收益。

提出的方法

将该问题表述为在3D长方体条件下，最小化箱体表面积，同时满足不重叠和边界约束。
采用构建式的DRL方法来优化装载序列；物品的朝向和空白空间的选择由一个启发式方法引导。
使用 Pointer Network（带注意力的编码器–解码器）输出装载顺序。
使用策略梯度（REINFORCE）并引入基线 b(s) 来降低梯度方差进行训练。
基线初始化采用启发式生成的装载计划；通过记忆回放进行更新以改进基线。
在测试阶段，应用贪婪选择结合束搜索（BS）以改善序列预测。

实验结果

研究问题

RQ1基于 Pointer Network 的 DRL 方法是否能够学习使非固定箱的表面积最小化的装载序列？
RQ2在这个新的三维BPP变体中，基于DRL的排序与精心设计的启发式方法相比如何？
RQ3推理阶段的束搜索是否比随机采样或贪婪解码带来显著改进？
RQ4在多大程度上可以将物品朝向和空余最大空间的选择纳入或改进 DRL 框架？

主要发现

箱数	随机	启发式	RL 采样	RL束搜索
8	44.70	43.97	41.82	41.82
10	48.38	47.33	45.03	45.02
12	50.78	49.34	46.71	46.71

基于DRL的方法在表面积减少方面比启发式方法在 Bin8、Bin10 和 Bin12 上约提升5%。
大小为3的束搜索在 Bin8、Bin10、Bin12 分别对启发式基线的提升为4.89%、4.88%、5.33%。
相比穷举最优序列，在对 Bin8 的5000次样本中，带束搜索的强化学习结果接近最优。
研究证明新的三维BPP变体是NP-hard（论文给出NP-hard性证明）。
该方法表明在实际、真实数据的三维装箱任务中，DRL 能超越精心设计的启发式方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。