QUICK REVIEW

[论文解读] From Reactive to Map-Based AI: Tuned Local LLMs for Semantic Zone Inference in Object-Goal Navigation

Yudai Noda, Kanji Tanaka|arXiv (Cornell University)|Mar 9, 2026

Multimodal Machine Learning Applications被引用 0

一句话总结

该论文提出一个基于地图的 AI 框架，使用 LoRA 微调的 Llama-2 模型从观察对象中推断语义区域，整合混合拓扑-网格地图进行 principled 探索，并在 AI2-THOR 中超越前沿与反应基线。

ABSTRACT

Object-Goal Navigation (ObjectNav) requires an agent to find and navigate to a target object category in unknown environments. While recent Large Language Model (LLM)-based agents exhibit zero-shot reasoning, they often rely on a "reactive" paradigm that lacks explicit spatial memory, leading to redundant exploration and myopic behaviors. To address these limitations, we propose a transition from reactive AI to "Map-Based AI" by integrating LLM-based semantic inference with a hybrid topological-grid mapping system. Our framework employs a fine-tuned Llama-2 model via Low-Rank Adaptation (LoRA) to infer semantic zone categories and target existence probabilities from verbalized object observations. In this study, a "zone" is defined as a functional area described by the set of observed objects, providing crucial semantic co-occurrence cues for finding the target. This semantic information is integrated into a topological graph, enabling the agent to prioritize high-probability areas and perform systematic exploration via Traveling Salesman Problem (TSP) optimization. Evaluations in the AI2-THOR simulator demonstrate that our approach significantly outperforms traditional frontier exploration and reactive LLM baselines, achieving a superior Success Rate (SR) and Success weighted by Path Length (SPL).

研究动机与目标

Motivate Object-Goal Navigation (ObjectNav) and address limitations of reactive LLM agents lacking spatial memory.
Define semantic zones as object-based functional regions to guide navigation.
Develop a hybrid topological-grid map to combine semantic reasoning with geometric planning.
Enable global planning via A* and TSP-based exploration to achieve systematic coverage.

提出的方法

Fine-tune a Llama-2 model with LoRA on object-zone co-occurrence data from AI2-THOR to infer zone categories and target existence.
Verbalize current observed object sets to form prompts for zone inference (Zone Z_est) and Target Existence Probability P_target.
Implement a dual-layer map: a metric occupancy grid for local planning and a semantic topological graph where nodes are zones.
Use SBERT to compute semantic similarity between target and observed objects to guide zone relevance.
Prioritize semantic frontiers with a weighted heuristic incorporating distance and P_target.
Solve local scanning order via Traveling Salesman Problem (TSP) to minimize path length while surveying high-probability zones.

实验结果

研究问题

RQ1Can LoRA-finetuned LLMs accurately infer semantic zones from object observations in indoor environments?
RQ2Does integrating semantic zones with a hybrid map improve exploration efficiency and success in ObjectNav?
RQ3What is the impact of semantic priors on frontier selection and path planning compared to purely geometric approaches?
RQ4How does domain-specific fine-tuning affect navigation performance and ablation outcomes?

主要发现

The proposed method achieves an 85% Success Rate (SR).
The method achieves a 0.52 SPL, outperforming the Frontier baseline (0.31) and Reactive LLM baseline.
LoRA-fine-tuned zone inference accuracy reaches 92%.
A zero-shot (non-LoRA) model leads to a 30% increase in total distance traveled due to redundant scanning.
Reactive LLM without a map suffers from myopic behavior, while frontier-only methods lack semantic guidance.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。