QUICK REVIEW

[Paper Review] RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation

Ajay Mandlekar, Yuke Zhu|arXiv (Cornell University)|Nov 7, 2018

Mobile Crowdsensing and Crowdsourcing81 citations

TL;DR

RoboTurk crowdsources 6-DoF robot demonstrations via mobile devices, enabling imitation-learning data collection, robustness to network conditions, and effective policy learning from large demonstration datasets.

ABSTRACT

Imitation Learning has empowered recent advances in learning robotic manipulation tasks by addressing shortcomings of Reinforcement Learning such as exploration and reward specification. However, research in this area has been limited to modest-sized datasets due to the difficulty of collecting large quantities of task demonstrations through existing mechanisms. This work introduces RoboTurk to address this challenge. RoboTurk is a crowdsourcing platform for high quality 6-DoF trajectory based teleoperation through the use of widely available mobile devices (e.g. iPhone). We evaluate RoboTurk on three manipulation tasks of varying timescales (15-120s) and observe that our user interface is statistically similar to special purpose hardware such as virtual reality controllers in terms of task completion times. Furthermore, we observe that poor network conditions, such as low bandwidth and high delay links, do not substantially affect the remote users' ability to perform task demonstrations successfully on RoboTurk. Lastly, we demonstrate the efficacy of RoboTurk through the collection of a pilot dataset; using RoboTurk, we collected 137.5 hours of manipulation data from remote workers, amounting to over 2200 successful task demonstrations in 22 hours of total system usage. We show that the data obtained through RoboTurk enables policy learning on multi-step manipulation tasks with sparse rewards and that using larger quantities of demonstrations during policy learning provides benefits in terms of both learning consistency and final performance. For additional results, videos, and to download our pilot dataset, visit $\\href{http://roboturk.stanford.edu/}{\ exttt{roboturk.stanford.edu}}$

Motivation & Objective

Motivate scalable collection of high-quality robotic demonstrations for imitation learning to address data bottlenecks.
Design a crowdsourcing platform that uses ubiquitous devices (iPhone) to teleoperate simulated robots in real time.
Evaluate user interfaces and network robustness to show RoboTurk can match VR hardware performance and withstand poor connections.
Assemble a pilot dataset of demonstrations and demonstrate policy learning from sparse rewards with this data.

Proposed method

Implement a cloud-based platform that streams video and teleoperation commands with WebRTC for low-latency control.
Use an iPhone with ARKit as the motion controller to map pose to robot end-effector movement.
Provide a coordination server that creates dedicated teleoperation sessions per user for scalable multi-user operation.
Employ a modular architecture to easily extend to new tasks, simulators, and robots.
Conduct a user study comparing interfaces (Keyboard, 3D Mouse, VR Controller, Phone) and assess performance under varying network conditions.
Collect and publish a pilot dataset (over 2200 demonstrations, 137 hours) to enable learning via demonstration-guided reinforcement learning.

Experimental results

Research questions

RQ1Can RoboTurk collect large-scale, high-quality teleoperation demonstrations via crowdsourcing using common devices?
RQ2How does the iPhone-based interface compare to VR and other inputs in terms of task completion times?
RQ3Are RoboTurk demonstrations robust to network latency and bandwidth variations for remote teleoperation?
RQ4Do larger demonstration datasets improve policy learning for sparse-reward manipulation tasks?

Key findings

The phone interface achieves completion times statistically similar to VR controllers and significantly faster than keyboard or 3D mouse on picking tasks.
Under network conditions ranging from baseline to low bandwidth/high delay, completion times distributions remain similar, showing robustness.
The pilot dataset includes 137 hours of data with 2200+ successful demonstrations collected in 20 hours of usage.
Policy learning from demonstrations benefits from larger numbers of demonstrations, with 1000 demonstrations yielding the best mean performance on both can-picking and round-assembly tasks.
Demonstrations initialized RL episodes from demo states via PPO, demonstrating feasibility of sparse-reward manipulation learning from crowdsourced data.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.