QUICK REVIEW

[논문 리뷰] WebNavigator: Global Web Navigation via Interaction Graph Retrieval

Xuanwang Zhang, Yuteng Han|arXiv (Cornell University)|2026. 03. 20.

Multimodal Machine Learning Applications인용 수 0

한 줄 요약

WebNavigator는 웹 탐색을 미리 구성된 인터랙션 그래프를 기반으로 결정론적 Retrieve-Reason-Teleport 워크플로우로 재정의하며, Topological Blindness를 해결하고 6-액션 인터페이스로 WebArena와 Online-Mind2Web에서 최첨단 성능을 달성합니다.

ABSTRACT

Despite significant advances in autonomous web navigation, current methods remain far from human-level performance in complex web environments. We argue that this limitation stems from Topological Blindness, where agents are forced to explore via trial-and-error without access to the global topological structure of the environment. To overcome this limitation, we introduce WebNavigator, which reframes web navigation from probabilistic exploration into deterministic retrieval and pathfinding. WebNavigator constructs Interaction Graphs via zero-token cost heuristic exploration offline and implements a Retrieve-Reason-Teleport workflow for global navigation online. WebNavigator achieves state-of-the-art performance on WebArena and OnlineMind2Web. On WebArena multi-site tasks, WebNavigator achieves a 72.9\% success rate, more than doubling the performance of enterprise-level agents. This work reveals that Topological Blindness, rather than model reasoning capabilities alone, is an underestimated bottleneck in autonomous web navigation.

연구 동기 및 목표

반응적이며 시행착오에 의존한 웹 탐색에서 지속적인 환경 그래프를 활용한 글로벌 계획으로의 전환 동기화.
LLMs에 의존하지 않고 사이트 토폴로지를 포착하기 위한 Offline Interaction Graph 구성 제안.
deterministic 탐색을 위한 Retrieve-Reason-Teleport 워크플로우를 활용한 Online Retrieval-Augmented Navigation 도입.
글로벌 그래프 탐색과 6-동작 인터페이스로 WebArena와 Online-Mind2Web에서 최첨단 성능을 시연하고, 작동 공간 축소 및 사이트 간 일반화 개선을 강조.
환경 지식의 완전성과 멀티모달 조회의 대역폭이 계획 효율성에 미치는 영향을 실증적으로 제시하고, 충분한 탐색과 견고한 6-액션 설계 이후에는 수익 감소를 보임

제안 방법

오프라인 휴리스틱 자동 탐색을 통해 동적 요소와 상호 작용하고 다중 모달 관찰(스크린샷 및 구조적 메타데이터)을 포착하여 Interaction Graph G를 구성합니다.
온라인 탐색 중 LLM 호출 없이 조회를 가능하게 하도록 모든 노드를 벡터 데이터베이스에 임베딩하고 인덱싱합니다.
추론 시, 다단계 Global-View 탐색기로 다음을 수행합니다: 다중 모달 검색으로 상위-k 후보 노드를 Retrieve, 다중 모달 LLM으로 최적 후보를 Reason으로 선택, 목표 노드까지의 G에서 최단 경로를 계산해 Teleport합니다.
탐색(plan), 도메인 전환, 저수준 브라우저 상태 관리를 포괄하는 단일 navigate(domain,query) 액션으로 작동합니다.
조회와 관찰 사이의 미세한 정렬을 보존하기 위해 늦은 상호작용 토큰 수준 임베딩 유사성을 사용합니다.
Compact한 6-액션 인터페이스와 전역 그래프 탐색을 통해 순수 반응형 비교기보다 결정론적이고 전역적으로 최적의 탐색이 가능함을 보여줍니다.

Figure 1: Overview of WebNavigator. WebNavigator resolves Topological Blindness via a two-phase paradigm. (1) Offline Interaction Graph Construction . A heuristic auto-exploration engine discovers dynamic page observations at zero-token cost and indexes all observations into a vector database. (2) O

실험 결과

연구 질문

RQ1컴팩트하고 오프라인에서 구성된 Interaction Graph가 결정론적 웹 탐색을 가능하게 하는 충분한 글로벌 구조를 포착할 수 있는가?
RQ2그래프를 통한 Retrieve-Reason-Teleport으로 이동을 확률적 탐색에서 이동시키면 Topological Blindness를 다양한 사이트에서 완화할 수 있는가?
RQ3지식의 완전성 및 멀티모달 조회의 대역폭이 탐색 성공에 어떤 영향을 미치는가?
RQ4늦은 상호작용 조회와 밀집 임베딩 간의 차이가 조회 품질과 탐색 성능에 어떤 영향을 주는가?
RQ5단일화되고 도메인에 의존하지 않는 navigate(domain,query) 인터페이스가 여러 웹사이트에 일반화되기에 충분한가?

주요 결과

방법	모델	동작 수	WebArena 성공률 (%)	Online-Mind2Web 성공률 (%)	멀티사이트 성공률 (%)	쇼핑 성공률 (%)	CMS 성공률 (%)	Reddit 성공률 (%)	GitLab 성공률 (%)	Map 성공률 (%)
WebNavigator (Ours)	Qwen3-VL-32B-Instruct	6	47.8	43.8	44.9	45.1	75.5	50.6	44.0	39.7
WebNavigator (Ours)	GPT-4o	6	49.9	50.0	44.4	48.6	73.6	42.2	51.4	41.3
WebNavigator (Ours)	Claude-Sonnet-4	6	57.1	50.0	51.9	58.2	85.9	50.0	51.4	38.7
WebNavigator (Ours)	Gemini-2.5-Pro	6	63.3	72.9	51.9	66.5	85.9	62.2	53.2	52.7

WebNavigator는 WebArena와 Online-Mind2Web에서 최첨단 성능을 달성하며 Gemini-2.5-Pro(enterprise 수준 CUGA 대비)로 다중 사이트 작업에서 72.9%를 달성합니다.
WebArena 다중 사이트 작업에서 WebNavigator는 GPT-4o로 50.0%의 성공률, Gemini-2.5-Pro로 63.3%를 달성하며 기존 방법을 크게 능가합니다.
Online-Mind2Web의 136개 실제 웹사이트에서 WebNavigator는 Gemini-2.5-Pro로 52.7%를 달성하여 강력한 일반화를 확립합니다.
이 접근법은 6-액션 인터페이스(navigate(domain,query))와 Retrieve-Reason-Teleport 워크플로우를 사용하여 탐색을 Interaction Graph 위의 결정론적 경로 찾기로 변환합니다.
늦은 상호작용 조회(token-레벨)은 조회를 위한 밀집 임베딩 방법보다 우수하여 미세한 시각-의미 매칭의 중요성을 시사합니다.
경험적 특정 실험 결과, 환경 지식의 완전성(depth)과 정보 대역폭(k)이 성능에 강하게 영향을 미치며, 충분한 탐색과 견고한 6-액션 설계 이후에는 수익적 증가가 감소합니다.

Figure 2: Trajectory comparison on a multi-site task (WebArena 760), which requires retrieving a specific customer address from the CMS to plan a route on the Map. WebNavigator achieves human-level planning via two navigate(domain, query) actions, whereas the ReAct baseline prematurely terminates du

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.