QUICK REVIEW

[論文レビュー] Perception-Based Beliefs for POMDPs with Visual Observations

Miriam Dorothea Schäfers, Merlijn Krale|arXiv (Cornell University)|Feb 5, 2026

Reinforcement Learning in Robotics被引用数 0

ひとこと要約

この論文は、視覚観測を状態分布に写像し信念を更新する視覚認識モデルを統合し、視覚的破損に対する堅牢性を向上させる不確実性定量化を備えた POMDP 向けの Perception-based Beliefs for POMDPs（Framework）を提案する。既存の POMDP ソルバと互換性を持つ。

ABSTRACT

Partially observable Markov decision processes (POMDPs) are a principled planning model for sequential decision-making under uncertainty. Yet, real-world problems with high-dimensional observations, such as camera images, remain intractable for traditional belief- and filtering-based solvers. To tackle this problem, we introduce the Perception-based Beliefs for POMDPs framework (PBP), which complements such solvers with a perception model. This model takes the form of an image classifier which maps visual observations to probability distributions over states. PBP incorporates these distributions directly into belief updates, so the underlying solver does not need to reason explicitly over high-dimensional observation spaces. We show that the belief update of PBP coincides with the standard belief update if the image classifier is exact. Moreover, to handle classifier imprecision, we incorporate uncertainty quantification and introduce two methods to adjust the belief update accordingly. We implement PBP using two traditional POMDP solvers and empirically show that (1) it outperforms existing end-to-end deep RL methods and (2) uncertainty quantification improves robustness of PBP against visual corruption.

研究の動機と目的

VPOMDP の高次元画像観測を伴う不確実性下での計画を動機づける。
画像を状態分布へ写像して信念更新をスケーラブルにすることで、 perception と planning をデカップリングする。
perception に不確実性定量化を組み込み、視覚的破損に対する堅牢性を向上させる。
既存の POMDP ソルバ（例：HSVI、POMCP）に差し込み可能な多用途なフレームワークを提供し、エンドツーエンドの DRL 手法と比較する。

提案手法

vision observations が視覚状態成分のみ depend する vision-factorizable VPOMDP モデルを導入し、 perception による信念更新を可能にする。
perception モデル f: Z_v -> Delta(S_v) を定義し、視覚データセットから Pr(S_v | Z_v) を近似する。
perception モデルが厳密な場合に標準의 更新と一致する perception ベースの更新式を導出する（式 5 が標準更新へ簡約）。
不確実性定量化（温度スケールによる較正、TUQ、重み付き UUQ）を組み込み、信念更新時に信頼できない perception 出力を調整または無視する（式 6 と 7）。
perception ベースの更新を既存のプランナー（HSVI、POMCP）へ統合するため、planning dataset subset を planning に、別の perception dataset を不確実性を考慮した更新に用いる hat_M を用意する。

Figure 2 . An illustration (as Bayesian network representation) of the observation function of a VPOMDP under Assumption 1 . Notably, $\textbf{z}_{\mathrm{v}}^{\prime}$ may only depend on the visual variables of $\textbf{s}^{\prime}$ , i.e. $\textbf{s}_{\mathrm{v}}^{\prime}$ , while $\textbf{z}_{\ne

実験結果

リサーチクエスチョン

RQ1視覚観測をどのようにして観測空間を爆発させずに POMDP の信念更新に統合できるか。
RQ2パースペクションモデルが画像を状態分布へ写像し、厳密な場合に標準の信念更新を保持しつつ、不確実性をどう扱うべきか。
RQ3視覚的破損の下で VPOMDP 計画の頑健性を不確実性定量化は向上させるか。
RQ4従来の POMDP ソルバと組み合わせて競争力のある性能を実現できるか。
RQ5視覚ベースの計画設定におけるエンドツーエンド深層 RL 手法と比較してどうか。

主な発見

perception モデルが厳密な場合、信念更新が標準更新と一致する（理論的同値性）。
視覚観測が歪んだり不確かである場合、不確実性定量化は信念更新の頑健性を向上させる。
HSVI および POMCP を用いたフレームワーク実装は、最新の VPOMDP ソルバと競合し、特に視覚的破損下でエンドツーエンド深層 RL ベースラインを上回る。
計画専用の視覚データセットと perception データセットを用いることで、既存の POMDP ソルバとの実用的な統合と不確実性を考慮した計画をサポートする。

Figure 3 . Overview of the Perception-based Beliefs for POMDPs Framework ( \framework ). See Section 4.3 for a detailed explanation.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。