QUICK REVIEW

[论文解读] Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners

Allen Z. Ren, Anushri Dixit|arXiv (Cornell University)|Jul 4, 2023

Topic Modeling被引用 43

一句话总结

KnowNo 使用保对预测来校准基于 LLM 的规划中的不确定性，允许机器人在需要时请求人工帮助，同时保证用户指定的任务成功率并减少不必要的干预。它可以与现成的 LLM 开箱即用，在仿真和真实机器人任务中展示出提高的效率。

ABSTRACT

Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning -- that may provide utility for robots, but remain prone to confidently hallucinated predictions. In this work, we present KnowNo, which is a framework for measuring and aligning the uncertainty of LLM-based planners such that they know when they don't know and ask for help when needed. KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion while minimizing human help in complex multi-step planning settings. Experiments across a variety of simulated and real robot setups that involve tasks with different modes of ambiguity (e.g., from spatial to numeric uncertainties, from human preferences to Winograd schemas) show that KnowNo performs favorably over modern baselines (which may involve ensembles or extensive prompt tuning) in terms of improving efficiency and autonomy, while providing formal assurances. KnowNo can be used with LLMs out of the box without model-finetuning, and suggests a promising lightweight approach to modeling uncertainty that can complement and scale with the growing capabilities of foundation models. Website: https://robot-help.github.io

研究动机与目标

Motivate the need for calibrated uncertainty in language-based robot planning to avoid hallucinations and unsafe actions.
Propose KnowNo, a conformal-prediction-based framework to align LLM planner uncertainty with user-specified success levels.
Provide theoretical guarantees on calibrated confidence and minimal human intervention.
Demonstrate empirical benefits across simulation and hardware in diverse ambiguous scenarios.

提出的方法

Formulate planning as a multiple-choice Q&A (MCQA) using an LLM to generate candidate next steps and their uncalibrated confidences.
Apply conformal prediction (CP) to select a subset of candidates guaranteeing a user-defined coverage 1−ε.
Trigger human help when the CP prediction set is non-singleton, otherwise execute the singleton plan.
Extend CP to multi-step (sequence-level) planning by lifting to sequences and using a causally reconstructed prediction set at test time.
Provide a proof-of-calibration guarantee: with probability at least 1−δ over calibration, task completion is ≥1−ε and the average set size is minimized under the coverage constraint.
Evaluate KnowNo in simulation and hardware across spatial, numeric, attribute, and Winograd-schema ambiguities, with PaLM-2L as the primary LLM and comparisons to various baselines.

Figure 1: KnowNo uses Conformal Prediction (CP) to align the uncertainty of LLM planners. Given a language instruction, an LLM generates possible next steps and its confidences (scores) in these options. CP then provides a prediction set that includes the options with scores above a certain quantile

实验结果

研究问题

RQ1Can CP-based uncertainty estimation provide calibrated task success guarantees for LLM-based planners in robotics?
RQ2Does KnowNo reduce human intervention while maintaining a user-specified task success rate across diverse ambiguity types?
RQ3How does sequence-level (multi-step) calibration extend CP guarantees to extended planning horizons?
RQ4How does KnowNo perform relative to prompt-based and ensemble baselines in hardware and simulation?
RQ5Is KnowNo robust to different LLMs and prompt configurations?

主要发现

Table 1: Hardware Multi-Step Tabletop Rearrangement – Prediction Set and Intervention (1−ε, Plan Succ, Task Succ, Set Size, Help-Step, Help-Trial)
KnowNo	0.75	0.76	0.74	1.72	0.58	0.92
Simple Set	0.58	0.76	0.72	2.04	0.72	1.00
No Help	-	-	0.41	-	0	0

KnowNo achieves the target task success rate 1−ε with calibrated guarantees, while frequently reducing required human help compared to baselines.
In simulation, KnowNo reduces average prediction-set size and human intervention vs Simple Set and Ensemble Set across ambiguity types, with reductions up to 24% in certain settings.
Hardware experiments (multi-step tabletop rearrangement) show KnowNo lowers step-wise and trial-wise human-helpers by ~14% and reduces the average set size.
In mobile manipulation hardware scenarios, KnowNo maintains target success with reduced help and smaller prediction sets across PaLM-2L and GPT-3.5 variants.
CP-based uncertainty alignment remains effective even when the LLM confidences are imperfect, as CP provides coverage guarantees independent of |LLM calibration.|
KnowNo operates without fine-tuning the LLM and scales with foundation-model capabilities.

Figure 2: KnowNo formulates LLM planning as MCQA by first prompting an LLM to generate plausible options, and then asking it to predict the correct one. Based on the next-token likelihoods from a calibration dataset, CP finds the quantile value $\hat{q}$ such that all options with a score $\geq 1-\h

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。