Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators


Alexander Herzog
Google X
Kanishka Rao
Google Inc
Karol Hausman
Google Brain
Yao Lu
Google Research
Paul Wohlhart
Google Inc
Mengyuan Yan
Google Inc
Jessica Lin
Everyday Robots
Montserrat Gonzalez Arenas
Google Inc
Ted Xiao
Google Inc
Daniel Kappler
Google X
Daniel Ho
Google Inc
Jarek Rettinghouse
Everyday Robots
Yevgen Chebotar
Google Inc
Kuang-Huei Lee
Google Inc
Keerthana Gopalakrishnan
Google Inc
Ryan Julian
Google Inc
Adrian Li
Wayve
Chuyuan Fu
Everyday Robots
Bob Wei
Everyday Robots
Sangeetha Ramesh
Everyday Robots
Khem Holden
Google Inc
Kim Kleiven
Everyday Robots
David J Rendleman
Google Inc
Sean Kirmani
Everyday Robots
Jeffrey Bingham
Everyday Robots
Jonathan Weisz
Everyday Robots
Ying Xu
Everyday Robots
Wenlong Lu
Everyday Robots
Matthew Bennice
Everyday Robots
Cody Fong
Everyday Robots
David Do
Everyday Robots
Jessica Lam
Everyday Robots
Yunfei Bai
Google X
Benjie Holson
Google X
Michael Quinlan
Google X
Noah Brown
Google Inc
Mrinal Kalakrishnan
Google X
Julian Ibarz
Google Inc
Peter Pastor
Google X
Sergey Levine
Google Inc
Paper Website

Paper ID 22

Session 3. Self-supervision and RL for Manipulation

Poster Session Tuesday, July 11

Poster 22

Abstract: We describe a system for deep reinforcement learning of robotic manipulation skills applied to a large-scale real-world task: sorting recyclables and trash in office buildings. Real-world deployment of deep RL policies requires not only effective training algorithms, but the ability to bootstrap real-world training and enable broad generalization. To this end, our system combines scalable deep RL from real-world data with bootstrapping from training in simulation, and incorporates auxiliary inputs from existing computer vision systems as a way to boost generalization to novel objects, while retaining the benefits of end-to-end training. We analyze the tradeoffs of different design decisions in our system, and present a large-scale empirical validation that includes training on real-world data gathered over the course of 24 months of experimentation, across a fleet of 23 robots in three office buildings, with a total training set of 9527 hours of robotic experience. Our final validation also consists of 4800 evaluation trials across 240 waste station configurations, in order to evaluate in detail the impact of the design decisions in our system, the scaling effects of including more real-world data, and the performance of the method on novel objects.