QUICK REVIEW

[논문 리뷰] Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity

Deepak Pathak, Chris Xiaoxuan Lu|arXiv (Cornell University)|2019. 02. 14.

Modular Robots and Swarm Intelligence참고 문헌 35인용 수 45

한 줄 요약

본 논문은 자가 조립으로 형태(morphologies)로 진화하는 원시 로봇 팔다리를 트레이닝하고 Dynamic Graph Networks를 통해 모듈식 컨트롤러를 학습시키며, 고정-형태 기반 기준선에 비해 미지의 형태와 환경에 대한 일반화가 향상됨을 시연한다.

ABSTRACT

Contemporary sensorimotor learning approaches typically start with an existing complex agent (e.g., a robotic arm), which they learn to control. In contrast, this paper investigates a modular co-evolution strategy: a collection of primitive agents learns to dynamically self-assemble into composite bodies while also learning to coordinate their behavior to control these bodies. Each primitive agent consists of a limb with a motor attached at one end. Limbs may choose to link up to form collectives. When a limb initiates a link-up action, and there is another limb nearby, the latter is magnetically connected to the 'parent' limb's motor. This forms a new single agent, which may further link with other agents. In this way, complex morphologies can emerge, controlled by a policy whose architecture is in explicit correspondence with the morphology. We evaluate the performance of these dynamic and modular agents in simulated environments. We demonstrate better generalization to test-time changes both in the environment, as well as in the structure of the agent, compared to static and monolithic baselines. Project video and code are available at https://pathak22.github.io/modular-assemblies/

연구 동기 및 목표

다세포 조직에서 영감을 받은 적응 가능하고 일반화 가능한 에이전트를 향한 경로로서 모듈식 자가 조립의 동기를 부여한다.
링크/언링크를 RL 프레임워크 내의 행동으로 다루어 제어 정책과 형태를 함께 진화시킨다.
Dynamic Graph Networks (DGN)를 통해 진화하는 형태에 맞추어 모듈식 정책을 개발한다.
단일 모노리스 기준선에 비해 새로운 형태와 환경에 대한 향상된 제로샷 일반화를 시연한다.

제안 방법

자가 조립된 에이전트를 자석 관절로 연결된 팔다리 그래프로 표현한다.
각 팔다리는 토크와 함께 linking/unlinking 동작을 출력하는 공유 정책을 실행한다.
Dynamics: 그래프의 위상은 정책 출력(DGN)에 따라 시간에 따라 변화한다.
연결된 팔다리 간의 조정을 위해 에지 간 메시지 전달을 수행하고, 입력은 로컬 센서 데이터로 한정된다.
진화하는 그래프 전체에 걸친 팔다리 보상의 합을 최대화하도록 PPO로 최적화한다.
다양한 지형과 팔다리 수를 가진 서기 및 주행 작업으로 평가한다.

실험 결과

연구 질문

RQ1공동으로 학습된 제어-및 morphology 정책이 미지의 morphologies 및 환경에 일반화될 수 있는가?
RQ2모듈식의 그래프 구조 정책이 단일 정책보다 팔다리 수의 변화에 더 잘 전달되는가?
RQ3동적으로 조립된 morphology 간 제어를 조정하는 데 메시지 전달이 미치는 영향은 무엇인가?

주요 결과

DGN 정책은 서기 및 주행 작업에서 모놀리식 기준선보다 우수한 성능을 보인다.
DGN 정책은 다양한 팔다리 수(예: 6에서 4 또는 12로)의 제로샷 일반화를 강하게 보인다.
소프트웨어의 모듈성(공유된 팔다리 정책)과 하드웨어의 모듈성(자가 조립) 모두 각각보다 더 나은 학습 및 일반화에 기여한다.
메시지 전달이 있는 DGN은 여러 형태가 성공할 수 있는 주행보다도 긴 시퀀스의 조정(일어서기)에서 더 큰 도움을 준다.
형태 커리큘럼 하에 학습된 정책은 새로운 지형과 교란(바람, 물, 장애물)에 더 잘 일반화한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.