RT-1: Robotics Transformer for Real-World Control at Scale


Anthony Brohan
Google Research
Noah Brown
Google Research
Justice Carbajal
Google Research
Yevgen Chebotar
Google Inc
Joseph Dabis
Google Research
Chelsea Finn
Google Brain
Keerthana Gopalakrishnan
Google Inc
Karol Hausman
Google Brain
Alexander Herzog
Google X
Jasmine Hsu
Google Inc
Julian Ibarz
Google Inc
Brian Ichter
Google Brain
Alex Irpan
Google Inc
Tomas Jackson
Google Research
Sally Jesmonth
Google Research
Nikhil Joshi
Google Research
Ryan Julian
Google Inc
Dmitry Kalashnikov
Google Inc
Yuheng Kuang
Google Research
Isabel Leal
Google Research
Kuang-Huei Lee
Google Inc
Sergey Levine
Google Inc
Yao Lu
Google Research
Utsav Malla
Google Research
Deeksha Manjunath
Google Research
Igor Mordatch
Google Inc
Ofir Nachum
Google Inc
Carolina Parada
Google Inc
Jodilyn Peralta
Google Inc
Emily Perez
Google Inc
Karl Pertsch
Google Inc
Jornell Quiambao
Google Inc
Kanishka Rao
Google Inc
Michael S Ryoo
Google, Stony Brook University
Grecia Salazar
Google Inc
Pannag R Sanketi
Google Inc
Kevin Sayed
Google Inc
Jaspiar Singh
Google Inc
Sumedh Sontakke
Google Inc
Austin Stone
Google Inc
Clayton Tan
Google Inc
Huong Tran
Google Inc
Vincent Vanhoucke
Google Inc
Steve Vega
Google Inc
Quan H Vuong
Google Inc
Fei Xia
Google Inc
Ted Xiao
Google Inc
Peng Xu
Google Inc
Sichun Xu
Google Inc
Tianhe Yu
Google Brain
Brianna Zitkovich
Google Inc
Paper Website

Paper ID 25

Session 4. Large Data and Vision-Language Models for Robotics

Poster Session Tuesday, July 11

Poster 25

Abstract: By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.