QUICK REVIEW

[論文レビュー] Solving a New 3D Bin Packing Problem with Deep Reinforcement Learning Method

Haoyuan Hu, Xiaodong Zhang|arXiv (Cornell University)|Aug 20, 2017

Optimization and Packing Problems参考文献 6被引用数 100

ひとこと要約

新しい3Dビンパック問題を導入し、ビンの表面積を最小化することを目的とし、Pointer NetworkベースのDRL法がヒューリスティックを約5％上回ることを実データで示す。ビーム探索が結果を向上させる。

ABSTRACT

In this paper, a new type of 3D bin packing problem (BPP) is proposed, in which a number of cuboid-shaped items must be put into a bin one by one orthogonally. The objective is to find a way to place these items that can minimize the surface area of the bin. This problem is based on the fact that there is no fixed-sized bin in many real business scenarios and the cost of a bin is proportional to its surface area. Our research shows that this problem is NP-hard. Based on previous research on 3D BPP, the surface area is determined by the sequence, spatial locations and orientations of items. Among these factors, the sequence of items plays a key role in minimizing the surface area. Inspired by recent achievements of deep reinforcement learning (DRL) techniques, especially Pointer Network, on combinatorial optimization problems such as TSP, a DRL-based method is applied to optimize the sequence of items to be packed into the bin. Numerical results show that the method proposed in this paper achieve about 5% improvement than heuristic method.

研究の動機と目的

Practical packing の課題に対処し、ビンサイズが固定されず、ビンコストが表面積に応じて変動するという現実的な問題を動機づける。
すべてのアイテムを含むことができるビン表面積を最小化することに焦点を当てた新しいNP-hardな3D BPP変種を定義する。
Pointer Networks に触発されたDRLベースの手法を開発し、パッキング順序を最適化し、ヒューリスティックと比較する。
8、10、12アイテムを含む実データの注文で経験的な利得を示す。

提案手法

3D直方体の非重複と境界制約を満たす条件下でビンの表面積を最小化する問題として定式化する。
パッキング順序を最適化する構成的なDRLアプローチを採用し、向きと空きスペースの選択はヒューリスティックに導かれる。
出力としてパッキング順序を得るためのPointer Network（エンコーダ–デコーダとアテンションを含む）を用いる。
方策勾配（REINFORCE）を用いて、基準値b(s)で勾配分散を低減する。
基準値の初期化はヒューリスティック生成のパッキング計画を用い、メモリリプレイで基準を洗練させる。
テスト時には貪欲法選択を用い、ビーム探索（BS）を適用してシーケンス予測を改善する。

実験結果

リサーチクエスチョン

RQ1Pointer NetworkベースのDRL法は、固定されないビンの表面積を最小化するパッキング順序を学習できるか？
RQ2この新しい3D BPP変種に対して、DRLベースのシーケンス構築はうまく設計されたヒューリスティックと比較してどうか？
RQ3推論時のビーム探索は、ランダムサンプリングや貪欲デコードに比べて意味のある改善を提供するか？
RQ4方向と空間最大スペースの空き領域の選択をDRLフレームワークにどの程度組み込むことができ、改善できるか？

主な発見

ビンの数	ランダム	ヒューリスティック	RLサンプリング	RLビーム探索
8	44.70	43.97	41.82	41.82
10	48.38	47.33	45.03	45.02
12	50.78	49.34	46.71	46.71

DRLベースの手法は、Bin8、Bin10、Bin12全体で表面積の削減においてヒューリスティック法を約5％上回る。
サイズ3のビーム探索は、Bin8・Bin10・Bin12それぞれでヒューリスティックベースに対して4.89％、4.88％、5.33％の改善をもたらす。
ビーム探索を用いたRLベースの結果は、Bin8の5000サンプルで厳密解列と比較してほぼ最適解に近い。
新しい3D BPP変種がNP-hardであることを本論は確認しており（NP-hard性の証明を提供）、実データの3DパッキングタスクにおいてDRLが慎重に設計されたヒューリスティックを上回ることを示す。
このアプローチは、実務的で実データに基づく3DパッキングタスクでDRLがヒューリスティックを上回ることを示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。