QUICK REVIEW

[論文レビュー] Automated Counting of Stacked Objects in Industrial Inspection

Corentin Dumery, Noa Etté|arXiv (Cornell University)|Mar 16, 2026

Industrial Vision Systems and Defect Detection被引用数 0

ひとこと要約

この論文は、積み重ねられた遮蔽された物体のカウントを、スタックの体積と多視点画像からの占有率を推定して分解する3Dカウントフレームワーク（3DC）を導入し、単一の可視ビューや特殊センサに依存せず正確なカウントを可能にします。

ABSTRACT

Visual object counting is a fundamental computer vision task in industrial inspection, where accurate, high-throughput inventory tracking and quality assurance are critical. Moreover, manufactured parts are often too light to reliably deduce their count from their weight, or too heavy to move the stack on a scale safely and practically, making automated visual counting the more robust solution in many scenarios. However, existing methods struggle with stacked 3D items in containers, pallets, or bins, where most objects are heavily occluded and only a few are directly visible. To address this important yet underexplored challenge, we propose a novel 3D counting approach that decomposes the task into two complementary subproblems: estimating the 3D geometry of the stack and its occupancy ratio from multi-view images. By combining geometric reconstruction with deep learning-based depth analysis, our method can accurately count identical manufactured parts inside containers, even when they are irregularly stacked and partially hidden. We validate our 3D counting pipeline on large-scale synthetic and diverse real-world data with manually verified total counts, demonstrating robust performance under realistic inspection conditions.

研究の動機と目的

コンテナ内および産業現場で、強く遮蔽された積み重ね物のカウント課題に対処する。
体積推定を占有推定から分離する分解カウントフレームワークを提案する。
占有率ネットワークと体積再構成パイプラインを開発し、正確なカウントを実現する。
ベンチマーク用の ground-truth カウントを含む大規模な合成・実世界データセットを提供する。
カウント式の理論的根拠を示し、産業界での実用性を示す。

提案手法

カウントを N = (gamma * V) / Vo と定式化する。ここで gamma は体積占有率、Vo は単位物体体積。
多視点画像から V を推定するために、3D ガウススプラットを用いたボクセル彫刻法を使い、物体と容器を再構成し、容器の厚さを差し引く。
深度マップから Phi という学習占有ネットワークを用いて gamma を予測する。400,000 枚の合成データセット（14,000 シーン）で訓練。
複数画像からのトップダウン視界を選択し、モノクロ Depth 推定器で深度を算出して Phi に入力することで、物体形状に依存せず占有推定を可能にする。
Vo は既知の物体形状またはテンプレート物体から推定する。未知の場合は、セグメンテーションと参照形状を用いてテンプレートから Vo を推定する。
深度に基づく占有と体積推定を統合して最終カウントを算出し、合成・実世界ベンチマークとヒト評価で検証する。

Figure 1 : 3D Counting (3DC). We estimate both the total volume occupied by the stack and the fraction of this volume taken up by the objects from multiple views of objects to be counted. Combining these estimates yields the total number of objects.

実験結果

リサーチクエスチョン

RQ1遮蔽された積み重ねの深度マップから占有比 gamma を信頼性高く推定できるか。
RQ23D スプラットとボクセル彫刻を用いた多視点画像から総体積 V をどの程度正確に再構成できるか。
RQ3占有推定と体積推定にカウントを分解することで、エンドツーエンドの単一ステップカウントより精度が向上するか。
RQ4 Depthマップの品質と合成トレーニングデータと実画像間のドメインギャップに対してどれだけ頑健か。
RQ5物体形状、容器、産業シナリオを横断して方法がどれだけ一般化するか。

主な発見

提案された 3DC フレームワークは、容器内の重なり合う積み重ね物を、体積占有と体積推定の二段階アプローチでエンドツーエンドのカウントを実現する。
大規模合成データセット（40万枚の画像、14千シーン）と実世界ベンチマーク（3,229枚、58シーン）により堅牢な評価と一般化をサポート。
合成深度マップで訓練された占有ネット Phi は深度から gamma を予測でき、複数視点の体積推定と組み合わせると、テスト設定で複数のベースラインおよび人間の性能を上回る正確なカウントを得られる。
3D ガウススプラットとボクセル彫刻による体積推定は、積み重ね体積を回復する際に凸包法やアルファ-コンケーブ Hull 法のベースラインを大幅に上回る。
Phi を完璧な深度ではなく推定深度で訓練するアブレーションは、実世界性能を改善し、効果的なドメイン適応を示す。
カウント方程式は理想的な合成条件で検証され、gamma、V、Vo が真のカウントの良い近似を支えることを確認する。

Figure 2 : 3DC pipeline. We decompose the counting task into estimating the volume of the objects to be counted and then estimating the occupancy ratio within that volume. The first is done on the basis of geometry reconstructed from segmentations in multiple images.The second uses as input a depth-

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。