QUICK REVIEW

[논문 리뷰] Text2Motion: From Natural Language Instructions to Feasible Plans

Kevin Lin, Christopher Agia|arXiv (Cornell University)|2023. 03. 21.

AI-based Problem Solving and Planning참고 문헌 71인용 수 10

한 줄 요약

Text2Motion은 대형 언어 모델 계획과 학습된 기술 라이브러리 및 기하학적 타당성 플래너를 결합하여 다단계의 장기적 조작 계획을 생성하고 검증합니다. 이는 challenging tasks에서 82%의 성공률을 달성하며, 이전의 언어 기반 플래너보다 우수한 13%를 기록합니다.

ABSTRACT

We propose Text2Motion, a language-based planning framework enabling robots to solve sequential manipulation tasks that require long-horizon reasoning. Given a natural language instruction, our framework constructs both a task- and motion-level plan that is verified to reach inferred symbolic goals. Text2Motion uses feasibility heuristics encoded in Q-functions of a library of skills to guide task planning with Large Language Models. Whereas previous language-based planners only consider the feasibility of individual skills, Text2Motion actively resolves geometric dependencies spanning skill sequences by performing geometric feasibility planning during its search. We evaluate our method on a suite of problems that require long-horizon reasoning, interpretation of abstract goals, and handling of partial affordance perception. Our experiments show that Text2Motion can solve these challenging problems with a success rate of 82%, while prior state-of-the-art language-based planning methods only achieve 13%. Text2Motion thus provides promising generalization characteristics to semantically diverse sequential manipulation tasks with geometric dependencies between skills.

연구 동기 및 목표

자연어 지시를 실행 가능한 상징적 및 기하학적 계획으로 변환해야 하는 장기 전 로봇 계획의 필요성 강조.
실행 전 계획의 타당성을 검증하기 위해 LLM과 조작 기술 라이브러리 및 기하학적 타당성 계획기를 통합.
언패지(task)들을 다루기 위해 Shooting 기반과 탐색 기반 계획의 하이브리드 전략 개발.
자연어 지시로부터 목표 상태를 추론하는 계획 종료 메커니즘을 제공하여 실행 전 완료를 검증합니다.

제안 방법

자연어 지시와 장면 설명으로부터 목표 제안 및 후보 기술 시퀀스를 생성하기 위해 LLM을 사용합니다.
각 기술을 매개변수화된 조작 원시와 타당성 평가에 대응하는 Q-함수를 가진 정책으로 표현합니다.
계획에 따른 각 기술의 성공 확률의 곱을 최대화하도록 기하학적 타당성 계획(STAP)을 적용합니다(Eq. 4–5).
촬영 기반 후보 전체 기술 시퀀스를 생성하고 타당성 점수로 최적을 선택하는 Shooting 기반 플래너를 구현합니다(Algorithm 1).
다음 기술을 LLM의 유용성과 기하학적 타당성을 결합하여 점진적으로 선택하는 탐색 기반 플래너를 구현하고 가능한 경우 Shooting을 인터리브합니다(Eq. 8–12).
실행 전 기하학적으로 타당한 계획을 찾기 위해 Shooting과 Greedy-step 계획을 번갈아 수행하는 하이브리드 Text2Motion 알고리즘을 제안합니다(Algorithm 3).
유효하지 않은 OOD 기술을 거부하기 위해 Q-value 앙상블 분산 기반의 out-of-distribution 탐지기를 포함합니다(Eq. 13).

Figure 1: To carry out the instruction “get two primary-colored objects onto the rack,” the robot must apply symbolic reasoning over the scene description and language instruction to deduce what skills should be executed to acquire a second primary-colored object, after noticing that a red object is

실험 결과

연구 질문

RQ1장기 로봇 조작에 대해 LLM이 생성한 계획의 정당성과 타당성을 어떻게 검증할 수 있는가?
RQ2기하학적 타당성 계획을 LLM과 통합하면 기하학적 의존성이 있는 작업에서 성공률이 향상되는가?
RQ3촬영-탐색 하이브리드 플래닝 전략은 부분적 어포던스 인지에 대한 근시적 언어 기반 플래너 대비 어떤 강점이 있는가?
RQ4사전 목표 예측이 실행 전 계획 종료를 신뢰성 있게 보장할 수 있는가?
RQ5학습된 다이나믹스와 Q-함수를 가진 기술 라이브러리에 계획을 바인딩하는 것의 이점과 한계는 무엇인가?

주요 결과

Text2Motion은 장기 상호작용 가능한 탁자 위 조작 과제 군에서 82%의 성공률을 달성합니다.
동일 평가에서 이전의 최첨단 언어 기반 계획 방법은 약 13%를 달성합니다.
다중 단계에 걸친 의존성을 다루려면 기술 시퀀스에 대한 기하학적 타당성 계획이 필수적입니다.
촬영-탐색을 결합한 하이브리드 플래너가 기하학적 의존성이 있는 과제에서 순수한 근시적 또는 순수한 계획 기반 기법보다 우수합니다.
사전 목표 예측은 실행 전 계획 종료를 신뢰할 수 있게 제공합니다.

Figure 2: shooting and greedy-search planning overview . Both shooting and greedy-search planners use the LLM to predict the set of valid goal states given the user’s natural language instruction and a description of the current state of the environment. These predicted goals are used to decide when

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.