Skip to main content
QUICK REVIEW

[论文解读] Safe and Scalable Web Agent Learning via Recreated Websites

Hyungjoo Chae, Jungsoo Park|arXiv (Cornell University)|Mar 11, 2026
Machine Learning and Algorithms被引用 0
一句话总结

VeriEnv 将真实网站克隆为可执行的合成环境,具可验证的任务奖励,实现安全、可扩展的自进化网页代理学习,无需真实世界交互。

ABSTRACT

Training autonomous web agents is fundamentally limited by the environments they learn from: real-world websites are unsafe to explore, hard to reset, and rarely provide verifiable feedback. We propose VeriEnv, a framework that treats language models as environment creators, automatically cloning real-world websites into fully executable, verifiable synthetic environments. By exposing controlled internal access via a Python SDK, VeriEnv enables agents to self-generate tasks with deterministic, programmatically verifiable rewards, eliminating reliance on heuristic or LLM-based judges. This design decouples agent learning from unsafe real-world interaction while enabling scalable self-evolution through environment expansion. Through experiments on web agent benchmarks, we show that agents trained with VeriEnv generalize to unseen websites, achieve site-specific mastery through self-evolving training, and benefit from scaling the number of training environments. Code and resources will be released at https://github.com/kyle8581/VeriEnv upon acceptance.

研究动机与目标

  • 促成在不与真实网站交互的前提下,安全、可扩展地学习自主网页代理。
  • 提出一种将真实站点重建为可执行环境并实施受控访问的流水线。
  • 生成可验证的任务和评审,以提供确定性、可检查的奖励。
  • 通过自进化训练在未见站点上实现泛化及对站点的特定掌握。

提出的方法

  • 使用编码代理将目标网站克隆到一个合成环境中(代码 C,数据库 D,Python SDK P)。
  • 通过提示大语言模型(LLMs)生成带有可执行验证程序的任务来创建可验证的任务,验证程序在 P 中实现。
  • 在剧集结束时对环境状态运行验证谓词,以提供确定性奖励。
  • 在可验证奖励的前提下,在合成环境中进行自进化循环的训练。
  • 评估对未见网站的泛化,并评估环境规模对性能的影响。
Figure 1 : Comparison between the traditional self-evolution paradigm and our verifiable environment framework. (a) In traditional settings, agents interact directly with real-world environments and rely on unvalidated synthetic tasks and non-verifiable, LLM-based reward signals, leading to unsafe e
Figure 1 : Comparison between the traditional self-evolution paradigm and our verifiable environment framework. (a) In traditional settings, agents interact directly with real-world environments and rely on unvalidated synthetic tasks and non-verifiable, LLM-based reward signals, leading to unsafe e

实验结果

研究问题

  • RQ1在可验证的合成环境中训练的代理能否对未见的真实网站实现泛化?
  • RQ2增加训练环境数量是否提升网页代理性能?
  • RQ3通过在克隆环境中重复自进化训练,能否实现对站点的特定掌握?
  • RQ4可验证的任务生成和奖励与基于LLM的评审以及非可验证方法相比有何差异?

主要发现

  • 在 VeriEnv 训练的代理在 WebArena 和 Mind2Web-Online 基准测试上的性能相较基线模型提升(+6.06 到 +9.09 分,取决于基线模型)。
  • VeriEnv 训练的代理能对未见的网站和跨域任务实现泛化。
  • 通过在克隆环境中反复训练,出现站点特定掌握,且 VeriEnv 的增益比非可验证方法更强、更加稳定。
  • 增加训练环境数量带来持续的性能提升,表明环境为中心的学习是有效的。
  • 人工评估显示环境质量高(功能正确性约 90%)、视觉评分高(4.7/5)、任务可执行性约 90%,评审正确率约 76%。
Figure 2 : Overview of VeriEnv . VeriEnv first clones a real website into a fully instrumented synthetic environment (code $C$ , database $D$ , and a Python SDK $P$ ) via coding agent, then uses task and judge generators to produce tasks at varying difficulty and verify both tasks and judges by inte
Figure 2 : Overview of VeriEnv . VeriEnv first clones a real website into a fully instrumented synthetic environment (code $C$ , database $D$ , and a Python SDK $P$ ) via coding agent, then uses task and judge generators to produce tasks at varying difficulty and verify both tasks and judges by inte

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。