QUICK REVIEW

[论文解读] STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

ELita Lobo, Xu Chen|arXiv (Cornell University)|Mar 5, 2026

Multimodal Machine Learning Applications被引用 0

一句话总结

StructuredAgent 引入具有结构化内存模块的动态 And/Or 树规划，用于解决长时 horizon 的网页任务，在 WebVoyager、WebArena 与购物基准上表现优越，且提供可解释的计划。

ABSTRACT

Recent advances in large language models (LLMs) have enabled agentic systems for sequential decision-making. Such agents must perceive their environment, reason across multiple time steps, and take actions that optimize long-term objectives. However, existing web agents struggle on complex, long-horizon tasks due to limited in-context memory for tracking history, weak planning abilities, and greedy behaviors that lead to premature termination. To address these challenges, we propose STRUCTUREDAGENT, a hierarchical planning framework with two core components: (1) an online hierarchical planner that uses dynamic AND/OR trees for efficient search and (2) a structured memory module that tracks and maintains candidate solutions to improve constraint satisfaction in information-seeking tasks. The framework also produces interpretable hierarchical plans, enabling easier debugging and facilitating human intervention when needed. Our results on WebVoyager, WebArena, and custom shopping benchmarks show that STRUCTUREDAGENT improves performance on long-horizon web-browsing tasks compared to standard LLM-based agents.

研究动机与目标

解决当前基于 LLM 的网页代理在长时任务中的局限性（记忆、规划与鲁棒性）。
提出一个将规划与执行交错的分层 And/Or 规划框架，以实现自适应决策。
引入一个结构化内存模块，用于在信息检索过程中跟踪候选实体与约束。
提供可解释的分层计划，便于调试和人工干预。
在 WebVoyager、WebArena 与一个复杂购物基准上证明有效性。

提出的方法

用包含 AND、OR 和 ACTION 节点的 And/Or 规划树来表示任务。
使用一个 LLM 作为高级控制器，发出局部树操作指令，而框架负责树的构建与遍历。
采用贪心、迭代的深度优先搜索，在动态修订和误差反向传播中扩展和剪枝 And/Or 树。
实现树操作：节点扩展、节点修复、全局树更新和节点完成度检查，由 Observation Summarizer 指导。
引入结构化内存模块，维护一个动态候选实体表并检索满足约束的前K候选项，以指导决策。

Figure 1 : Illustration of StructuredAgent solving a web task via greedy DFS of a dynamically constructed And/Or tree. The root node represents the task objective and is expanded into subtasks that are progressively refined and executed. Node types are color-coded to distinguish OR ( $\vee$ ), AND (

实验结果

研究问题

RQ1StructuredAgent 是否在跨多个基准和骨干模型的长时网页任务上提升成功率，相较于强基线？
RQ2具有动态修订和误差反向传播的分层 And/Or 规划如何影响网页任务执行的可靠性和可解释性？
RQ3结构化内存模块对受约束的信息检索任务有何影响？
RQ4StructuredAgent 的优势是否能在不同的模型家族（如 Claude、Kimi-k2、基于 GPT 的骨干模型）上泛化？

主要发现

StructuredAgent 在 Amazon Easy 任务上达到最高平均分（评估中为 83.3%）。
StructuredAgentMem 在人工评估下将 Amazon Hard 的表现提升约 5%。
在 WebVoyager Easy 上，StructuredAgent 与基线相比具有竞争力，仅下降约 1.5% 左右。
StructuredAgent 在 WebArena Shopping 与 Reddit 任务上总体表现优于基线，提升约 6% 至 20%。
分层规划的好处在使用更强骨干模型（Claude 3.7）时仍然存在。
该方法可推广至 Claude 以外的模型家族（如 Kimi-k2-0905），尽管长上下文敏感性可能影响增益。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。