[论文解读] Generative 6D Pose Estimation via Conditional Flow Matching
该论文提出了 Flose,一种用于实例级6D位姿估计的条件流匹配方法,将重叠感知的几何与外观特征融合,并使用鲁棒的基于RANSAC的配准,在BOP基准上实现了显著的AR提升。
Existing methods for instance-level 6D pose estimation typically rely on neural networks that either directly regress the pose in $\mathrm{SE}(3)$ or estimate it indirectly via local feature matching. The former struggle with object symmetries, while the latter fail in the absence of distinctive local features. To overcome these limitations, we propose a novel formulation of 6D pose estimation as a conditional flow matching problem in $\mathbb{R}^3$. We introduce Flose, a generative method that infers object poses via a denoising process conditioned on local features. While prior approaches based on conditional flow matching perform denoising solely based on geometric guidance, Flose integrates appearance-based semantic features to mitigate ambiguities caused by object symmetries. We further incorporate RANSAC-based registration to handle outliers. We validate Flose on five datasets from the established BOP benchmark. Flose outperforms prior methods with an average improvement of +4.5 Average Recall. Project Website : https://tev-fbk.github.io/Flose/
研究动机与目标
- 解决直接SE(3)回归和基于特征的间接方法在处理物体对称性和稀疏特征时的局限性。
- 提出在R^3中的条件流匹配形式用于实例级6D位姿估计。
- 结合视觉基础模型的外观基于语义特征以消除对称物体的歧义。
- 通过基于RANSAC的配准与ICP精化提高对离群值的鲁棒性。
提出的方法
- 将6D位姿估计表述为R^3中的条件流匹配问题。
- 将重叠感知几何特征与基于外观的语义特征进行融合,以条件化去噪过程。
- 使用去噪网络 Psi_Omega 学习将嘈杂样本映射到对齐形状的位移场。
- 将流模型在组合特征与位置编码上进行条件化,以引导去噪。
- 应用基于RANSAC的Kabsch解进行鲁棒的位姿初始化,随后进行ICP精化。
![Fig. 3 : Qualitative comparison of Flose (center) vs. an RPF-based [ 24 ] baseline adapted for pose estimation (right). By integrating semantic features and outlier-robust registration, Flose predicts more accurate poses under severe occlusions (rows 1-2) and resolves symmetry ambiguities where pure](https://ar5iv.labs.arxiv.org/html/2602.19719/assets/main/figures/qualitatives/LMO_APE_000788.png)
实验结果
研究问题
- RQ1在对称性和遮挡情况下,R^3中的条件流匹配能否准确估计实例级对象的6D位姿?
- RQ2将基于外观的语义特征与几何线索结合是否能改善对称物体的歧义消解?
- RQ3通过基于RANSAC的配准实现鲁棒离群值处理是否是此框架中可靠位姿估计的必要条件?
- RQ4在BOP基准的多样对象与条件下,Flose 的表现如何与最先进的方法相比?
主要发现
| 方法 | S.M. | LM-O | T-LESS | TUD-L | IC-BIN | YCB-V | Avg |
|---|---|---|---|---|---|---|---|
| Pix2Pose [22] | 58.8 | 51.2 | 82.0 | 39.0 | 78.8 | 62.0 | |
| ZebraPose [23] | 75.2 | 72.7 | 94.8 | 65.2 | 86.6 | 78.9 | |
| GDRNPP (BOP22) [17] | 77.5 | 87.4 | 96.6 | 72.2 | 92.1 | 85.2 | |
| HccePose(BF) [28] | 80.5 | 87.9 | 94.4 | 72.4 | 91.1 | 85.3 | |
| GDRNPP (BOP23) [17] | 79.4 | 91.4 | 96.4 | 73.7 | 92.8 | 86.7 | |
| Koenig-Hybrid | ✓ | 63.1 | 65.5 | 92.0 | 43.0 | 70.1 | 66.7 |
| CosyPose | ✓ | 71.4 | 70.1 | 93.9 | 64.7 | 86.1 | 77.2 |
| SurfEmb | ✓ | 75.8 | 83.3 | 93.3 | 65.6 | 82.4 | 80.1 |
| CIR | ✓ | 73.4 | 77.6 | 96.8 | 67.6 | 89.3 | 81.0 |
| PFA | ✓ | 79.7 | 85.0 | 96.0 | 67.6 | 88.8 | 83.4 |
| Flose (ours) | ✓ | 86.1 | 86.9 | 98.8 | 74.8 | 92.8 | 87.9 |
| Improv. over row 10 | +6.4 | +1.9 | +2.8 | +7.2 | +4.0 | +4.5 |
- Flose 在所比较的数据集类别中对最强的单模型竞争对手平均AR提升4.5。
- Flose 超越按对象的基线和单模型基线,对称对象上有显著提升。
- 将外观特征与重叠感知几何的整合在严格匹配下显著提高AR与就位内点率。
- 基于RANSAC的配准加上ICP精化提供鲁棒性,并在纯几何精化基础上额外提升约4.3 AR。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。