QUICK REVIEW

[论文解读] Automated Counting of Stacked Objects in Industrial Inspection

Corentin Dumery, Noa Etté|arXiv (Cornell University)|Mar 16, 2026

Industrial Vision Systems and Defect Detection被引用 0

一句话总结

论文提出一个三维计数框架（3DC），将堆叠、遮挡对象的计数分解为估计堆叠体积和从多视图图像获取的占据比，从而在不依赖单一可见视图或专用传感器的情况下实现准确计数。

ABSTRACT

Visual object counting is a fundamental computer vision task in industrial inspection, where accurate, high-throughput inventory tracking and quality assurance are critical. Moreover, manufactured parts are often too light to reliably deduce their count from their weight, or too heavy to move the stack on a scale safely and practically, making automated visual counting the more robust solution in many scenarios. However, existing methods struggle with stacked 3D items in containers, pallets, or bins, where most objects are heavily occluded and only a few are directly visible. To address this important yet underexplored challenge, we propose a novel 3D counting approach that decomposes the task into two complementary subproblems: estimating the 3D geometry of the stack and its occupancy ratio from multi-view images. By combining geometric reconstruction with deep learning-based depth analysis, our method can accurately count identical manufactured parts inside containers, even when they are irregularly stacked and partially hidden. We validate our 3D counting pipeline on large-scale synthetic and diverse real-world data with manually verified total counts, demonstrating robust performance under realistic inspection conditions.

研究动机与目标

解决容器与工业场景中对高度遮挡的堆叠对象进行计数的挑战。
提出将体积估计与占据估计分离的分解计数框架。
开发占据比网络与体积重建流程以实现准确计数。
提供大规模合成与真实世界数据集，并给出真实计数的基准。
为计数公式提供理论依据并展示在工业中的实际应用性。

提出的方法

将计数公式表述为 N = (gamma * V) / Vo，其中 gamma 为体积占据比，Vo 为单位对象体积。
利用多视图图像通过基于3D高斯点云投影的体素刻画方法估计 V，从而重建对象与容器并减去容器厚度。
通过学习型占据网络 Phi，从深度图预测 gamma，该网络在包含14,000个场景的40万张合成图像上进行训练。
从多张图像中选择一个关键视图（俯视可见性），用单目深度估计器获取深度以输入 Phi，使占据估计在对象几何形状不同时也可进行。
从已知对象几何形状或模板对象中估计 Vo；当未知时，通过分割与参考形状从模板中推断 Vo。
将基于深度的占据与体积估计整合以计算最终计数，并通过合成与真实基准及人工评估进行验证。

Figure 1 : 3D Counting (3DC). We estimate both the total volume occupied by the stack and the fraction of this volume taken up by the objects from multiple views of objects to be counted. Combining these estimates yields the total number of objects.

实验结果

研究问题

RQ1占据比 gamma 是否能从部分遮挡的堆叠的深度图中可靠推断？
RQ2通过多视图图像的3D投影与体素刻画，堆叠总量 V 能否被多准确地重建？
RQ3将计数分解为占据估计与体积估计是否比端到端单步计数更准确？
RQ4对深度图质量以及合成训练数据与真实图像之间的领域差距，该方法有多鲁棒？
RQ5该方法在对象形状、容器及工业场景的泛化能力如何？

主要发现

提出的 3DC 框架通过两阶段方法（体积占据与体积估计）实现对容器中重叠堆叠对象的端到端计数。
一个大型合成数据集（40万张图像，14k 场景）以及一个真实世界基准（3,229 张图像，58 个场景）支持鲁棒评估与泛化。
在合成深度图上训练的占据网络 Phi 能从深度预测 gamma，并在结合多视角体积估计时，在测试设置中达到高于多种基线与人工性能的准确计数。
通过3D高斯投影和体素刻画进行体积估计显著优于凸包与α-凹包等基线在恢复堆叠体积方面的表现。
消融研究显示用预测深度图而非完美深度来训练 Phi 能提升真实世界性能，表明有效的领域自适应。
在理想的合成条件下验证了计数方程，确认 gamma、V 与 Vo 支撑对真实计数的良好近似。

Figure 2 : 3DC pipeline. We decompose the counting task into estimating the volume of the objects to be counted and then estimating the occupancy ratio within that volume. The first is done on the basis of geometry reconstructed from segmentations in multiple images.The second uses as input a depth-

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。