QUICK REVIEW

[논문 리뷰] Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks

Haoqi Yuan, Chi Zhang|arXiv (Cornell University)|2023. 03. 29.

Multimodal Machine Learning Applications인용 수 11

한 줄 요약

Plan4MC는 내재 보상으로 미세한 기술을 배우고, LLM을 통해 기술 그래프를 구축하며, DFS 기반 스킬 플래너를 사용해 Demonstrations 없이 40개의 다양한 Minecraft 작업을 해결하며 샘플 효율성 향상.

ABSTRACT

We study building multi-task agents in open-world environments. Without human demonstrations, learning to accomplish long-horizon tasks in a large open-world environment with reinforcement learning (RL) is extremely inefficient. To tackle this challenge, we convert the multi-task learning problem into learning basic skills and planning over the skills. Using the popular open-world game Minecraft as the testbed, we propose three types of fine-grained basic skills, and use RL with intrinsic rewards to acquire skills. A novel Finding-skill that performs exploration to find diverse items provides better initialization for other skills, improving the sample efficiency for skill learning. In skill planning, we leverage the prior knowledge in Large Language Models to find the relationships between skills and build a skill graph. When the agent is solving a task, our skill search algorithm walks on the skill graph and generates the proper skill plans for the agent. In experiments, our method accomplishes 40 diverse Minecraft tasks, where many tasks require sequentially executing for more than 10 skills. Our method outperforms baselines by a large margin and is the most sample-efficient demonstration-free RL method to solve Minecraft Tech Tree tasks. The project's website and code can be found at https://sites.google.com/view/plan4mc.

연구 동기 및 목표

인간 시범 없이 오픈 월드 환경에서 멀티태스크 에이전트를 구축하도록 동기를 부여합니다.
긴 호의 작업을 Finding, Manipulation, Crafting와 같은 미세한 기본 기술의 시퀀스로 분해합니다.
내재 보상을 활용하여 기술을 학습하고 탐색 및 초기화를 향상시키는 새로운 Finding-스킬을 도입합니다.
LLM으로부터 도출된 스킬 그래프와 DFS 기반 플래너를 사용하여 작업에 대한 실행 가능한 스킬 시퀀스를 생성합니다.

제안 방법

세 가지 유형의 미세한 Minecraft 기술을 정의합니다: Finding-skills, Manipulation-skills, and Crafting-skills.
내재 보상을 사용한 RL로 기본 기술을 학습합니다; 다른 기술의 초기화를 개선하기 위해 탐색을 위한 계층적 Finding-skill을 도입합니다.
LLM에 기술 관계와 선행조건을 설명하도록 프롬프트하여 스킬 그래프를 구성하고, 그래프에 대해 DFS 기반 계획을 수행하여 실행 가능한 스킬 시퀀스를 생성합니다.
계획된 스킬을 반복적으로 실행하고 인벤토리를 업데이트하여 다음 단계를 계획하는 스킬 검색 알고리즘을 개발합니다.
MineDojo의 40개 작업에서 기준 방법(MineAgent, Find-skill 없이 Plan4MC, Interactive LLM 계획, 제로샷 및 하프-스텝 변형)과 Plan4MC를 비교합니다.
Plan4MC가 샘플 효율이 더 높고 긴 호 작업에서 더 높은 성공률을 달성함을 입증합니다.

실험 결과

연구 질문

RQ1강력한 오픈 월드 환경에서 시범 없이 미세한 기본 기술을 학습할 수 있나요?
RQ2Finding-skill를 도입하면 샘플 효율성과 장시간 Minecraft 작업의 전반적인 성공률이 향상되나요?
RQ3LLM으로 생성된 스킬 그래프와 DFS 기반 플래너를 결합하면 개방 월드 작업을 효과적으로 분해하고 해결할 수 있나요?

주요 결과

Plan4MC는 40개의 다양한 Minecraft 작업을 달성하며, 종종 작업당 2–30개의 기본 기술이 필요합니다.
Plan4MC는 네 가지 작업 세트에서 기준선보다 우수하며, 다른 Demonstration-free RL 방법들보다 샘플 효율이 더 높습니다.
Finding-skill을 포함하면 그것이 없을 때보다 성공률이 크게 향상됩니다.
LLM 기반 스킬 그래프와 스킬 검색 알고리즘은 직관적인 LLM-단독 계획보다 계획 실수는 적고 실행 가능한 강력한 계획을 제공합니다.
인터랙티브 LLM 계획은 단순한 작업에서 Plan4MC와 비슷하지만, 계획 오류로 인해 더 긴 호 작업에서 성능이 떨어집니다.
Plan4MC는 Minecraft 기술 트리 내 철 곡괭이 제작에서 강력한 성능을 보여줍니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.