QUICK REVIEW

[论文解读] Online Bayesian Goal Inference for Boundedly-Rational Planning Agents

Tan Zhi‐Xuan, Jordyn L. Mann|arXiv (Cornell University)|Jun 13, 2020

AI-based Problem Solving and Planning参考文献 3被引用 31

一句话总结

论文提出 SIPS，一种序列蒙特卡洛方法，用于在线从最优和次优计划中推断代理的目标，通过将代理建模为在搜索与执行之间交错的有边界的理性规划者。

ABSTRACT

People routinely infer the goals of others by observing their actions over time. Remarkably, we can do so even when those actions lead to failure, enabling us to assist others when we detect that they might not achieve their goals. How might we endow machines with similar capabilities? Here we present an architecture capable of inferring an agent's goals online from both optimal and non-optimal sequences of actions. Our architecture models agents as boundedly-rational planners that interleave search with execution by replanning, thereby accounting for sub-optimal behavior. These models are specified as probabilistic programs, allowing us to represent and perform efficient Bayesian inference over an agent's goals and internal planning processes. To perform such inference, we develop Sequential Inverse Plan Search (SIPS), a sequential Monte Carlo algorithm that exploits the online replanning assumption of these models, limiting computation by incrementally extending inferred plans as new actions are observed. We present experiments showing that this modeling and inference architecture outperforms Bayesian inverse reinforcement learning baselines, accurately inferring goals from both optimal and non-optimal trajectories involving failure and back-tracking, while generalizing across domains with compositional structure and sparse rewards.

研究动机与目标

Motivates the need to infer goals from sub-optimal or failed plans as humans do.
Proposes a generative model of boundedly rational planning agents interacting with a symbolic environment.
Develops Sequential Inverse Plan Search (SIPS), an online SMC algorithm leveraging replanning to limit computation.
Embeds goals, states, and observations in a PDDL-based framework to support diverse domains.
Evaluates the approach against Bayesian IRL baselines across multiple domains and human-subject benchmarks.

提出的方法

Model agents as probabilistic programs with a goal prior, plan updates, action selection, and state transitions.
Represent goals and states using PDDL to handle diverse domains and sparse rewards.
Model sub-optimal planning via a stochastic boundedly-rational search with a random planning budget sampled from a negative binomial distribution.
Perform online inference with Sequential Inverse Plan Search (SIPS), a particle-based method that extends hypothesized plans as observations arrive.
Use resampling and two rejuvenation kernels (heuristic-driven goal proposals and error-driven replanning proposals) to maintain hypothesis diversity.
Implement inference in Gen with planning-domain embedding and leverage online partial-plan extension to keep computation tractable.

实验结果

研究问题

RQ1Can online Bayesian inference recover an agent's goal from sub-optimal or failed sequences of actions?
RQ2How does modeling boundedly rational planning (replanning with limited search) affect the ability to infer goals online?
RQ3Does SIPS outperform Bayesian IRL baselines in accuracy and speed across diverse planning domains?
RQ4How robust is the approach to model mismatch and to human-like planning behavior?
RQ5Can the framework generalize to domains with compositional structure and sparse rewards?

主要发现

Domain	Method	P(g_true\|o) Q1	P(g_true\|o) Q2	P(g_true\|o) Q3	Top-1	C0 (s)	MC (s)	AC (s)	N
Taxi (3 Goals)	SIPS (ours)	0.44	0.50	0.62	0.53	0.56	0.67	13.0	1.80	2.55	1429
Taxi (3 Goals)	BIRL (unbiased)	0.34	0.35	0.79	0.33	0.42	0.92	2.23	0.00	0.16	10000
Taxi (3 Goals)	BIRL (oracle)	0.37	0.47	0.81	0.42	0.44	0.86	1.63	0.00	0.12	2500
Doors, Keys & Gems (3 Goals)	SIPS (ours)	0.37	0.51	0.61	0.74	0.74	0.74	3.30	0.70	0.86	2099
Doors, Keys & Gems (3 Goals)	BIRL (unbiased)	0.33	0.33	0.33	0.33	0.33	0.33	3326	0.12	154	250000
Doors, Keys & Gems (3 Goals)	BIRL (oracle)	0.37	0.36	0.42	0.44	0.60	0.80	150	0.12	7.01	10000
Block Words (5 Goals)	SIPS (ours)	0.47	0.83	0.90	0.78	0.84	0.91	20.8	2.46	4.15	2506
Block Words (5 Goals)	BIRL (unbiased)	0.20	0.20	0.21	0.42	0.49	0.56	687	0.27	63.6	250000
Block Words (5 Goals)	BIRL (oracle)	0.20	0.29	0.45	0.73	0.80	0.96	22.2	0.05	2.12	10000
Intrusion Detection (20 Goals)	SIPS (ours)	0.56	0.87	0.87	0.65	0.87	0.87	375	6.60	28.0	13321
Intrusion Detection (20 Goals)	BIRL (unbiased)	0.05	0.05	0.05	0.05	0.05	0.05	18038	0.75	1069	250000
Intrusion Detection (20 Goals)	BIRL (oracle)	0.09	0.24	0.53	0.94	1.00	1.00	98	0.02	6.00	10000

SIPS accurately infers goals from both optimal and non-optimal trajectories, including backtracking and failures.
Across domains, SIPS often outperforms unbiased Bayesian IRL in accuracy and speed, sometimes matching or outperforming oracle IRL with substantially less computation.
SIPS yields higher estimates of the true goal posterior P(g_true|o) than baselines in several domains.
Human-inference patterns over time correlate more strongly with SIPS than with the BIRL baseline, indicating human-like reasoning.
SIPS demonstrates robustness to moderate mismatches between the data-generating process and the assumed agent model, and remains effective on human data.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。