Skip to main content

Showing 1–50 of 463 results for author: Levine, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05941  [pdf, other

    cs.RO cs.CV cs.LG

    Evaluating Real-World Robot Manipulation Policies in Simulation

    Authors: Xuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, Ted Xiao

    Abstract: The field of robotics has made significant advances towards generalist robot manipulation policies. However, real-world evaluation of such policies is not scalable and faces reproducibility challenges, which are likely to worsen as policies broaden the spectrum of tasks they can perform. We identify control and visual disparities between real and simulated environments as key challenges for reliab… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.04714  [pdf, other

    cs.RO cs.AI cs.LG

    RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes

    Authors: Kyle Stachowicz, Sergey Levine

    Abstract: Reinforcement learning provides an appealing framework for robotic control due to its ability to learn expressive policies purely through real-world interaction. However, this requires addressing real-world constraints and avoiding catastrophic failures during training, which might severely impede both learning progress and the performance of the final policy. In many robotics settings, this amoun… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: In review, RSS 2024

  3. arXiv:2404.16823  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Learning Visuotactile Skills with Two Multifingered Hands

    Authors: Toru Lin, Yu Zhang, Qiyang Li, Haozhi Qi, Brent Yi, Sergey Levine, Jitendra Malik

    Abstract: Aiming to replicate human-like dexterity, perceptual experiences, and motion patterns, we explore learning from human demonstrations using a bimanual system with multifingered hands and visuotactile data. Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hard… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Code and Project Website: https://toruowo.github.io/hato/

  4. arXiv:2404.06474  [pdf, other

    cs.AI

    Autonomous Evaluation and Refinement of Digital Agents

    Authors: Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr

    Abstract: We show that domain-general automatic evaluators can significantly improve the performance of agents for web navigation and device control. We experiment with multiple evaluation models that trade off between inference cost, modularity of design, and accuracy. We validate the performance of these models in several popular benchmarks for digital agents, finding between 74.4 and 92.9% agreement with… ▽ More

    Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: Code at https://github.com/Berkeley-NLP/Agent-Eval-Refine

  5. arXiv:2403.12945  [pdf, other

    cs.RO

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

    Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://droid-dataset.github.io/

  6. arXiv:2403.12910  [pdf, other

    cs.RO cs.AI cs.LG

    Yell At Your Robot: Improving On-the-Fly from Language Corrections

    Authors: Lucy Xiaoyang Shi, Zheyuan Hu, Tony Z. Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, Chelsea Finn

    Abstract: Hierarchical policies that combine language and low-level control have been shown to perform impressively long-horizon robotic tasks, by leveraging either zero-shot high-level planners like pretrained language and vision-language models (LLMs/VLMs) or models trained on annotated robotic demonstrations. However, for complex and dexterous skills, attaining high success rates on long-horizon tasks st… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://yay-robot.github.io/

  7. arXiv:2403.05612  [pdf, other

    cs.LG cs.AI cs.CL

    Unfamiliar Finetuning Examples Control How Language Models Hallucinate

    Authors: Katie Kang, Eric Wallace, Claire Tomlin, Aviral Kumar, Sergey Levine

    Abstract: Large language models (LLMs) have a tendency to generate plausible-sounding yet factually incorrect responses, especially when queried on unfamiliar concepts. In this work, we explore the underlying mechanisms that govern how finetuned LLMs hallucinate. Our investigation reveals an interesting pattern: as inputs become more unfamiliar, LLM outputs tend to default towards a ``hedged'' prediction, w… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  8. arXiv:2403.04082  [pdf, other

    cs.LG stat.ML

    Inference via Interpolation: Contrastive Representations Provably Enable Planning and Inference

    Authors: Benjamin Eysenbach, Vivek Myers, Ruslan Salakhutdinov, Sergey Levine

    Abstract: Given time series data, how can we answer questions like "what will happen in the future?" and "how did we get here?" These sorts of probabilistic inference questions are challenging when observations are high-dimensional. In this paper, we show how these questions can have compact, closed form solutions in terms of learned representations. The key idea is to apply a variant of contrastive learnin… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Code: https://github.com/vivekmyers/contrastive_planning

  9. arXiv:2403.03950  [pdf, other

    cs.LG cs.AI stat.ML

    Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

    Authors: Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taïga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, Rishabh Agarwal

    Abstract: Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  10. arXiv:2403.03174  [pdf, other

    cs.RO cs.AI

    MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting

    Authors: Fangchen Liu, Kuan Fang, Pieter Abbeel, Sergey Levine

    Abstract: Open-vocabulary generalization requires robotic systems to perform tasks involving complex and diverse environments and task goals. While the recent advances in vision language models (VLMs) present unprecedented opportunities to solve unseen problems, how to utilize their emergent capabilities to control robots in the physical world remains an open question. In this paper, we present MOKA (Markin… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  11. arXiv:2403.00991  [pdf, other

    cs.RO cs.CV cs.LG

    SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation

    Authors: Noriaki Hirose, Dhruv Shah, Kyle Stachowicz, Ajay Sridhar, Sergey Levine

    Abstract: Autonomous self-improving robots that interact and improve with experience are key to the real-world deployment of robotic systems. In this paper, we propose an online learning method, SELFI, that leverages online robot experience to rapidly fine-tune pre-trained control policies efficiently. SELFI applies online model-free reinforcement learning on top of offline model-based learning to bring out… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 11pages, 13 figures, 2 tables

  12. arXiv:2402.19446  [pdf, other

    cs.LG cs.AI cs.CL

    ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

    Authors: Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar

    Abstract: A broad use case of large language models (LLMs) is in goal-directed decision-making tasks (or "agent" tasks), where an LLM needs to not just generate completions for a given prompt, but rather make intelligent decisions over a multi-turn interaction to accomplish a task (e.g., when interacting with the web, using tools, or providing customer support). Reinforcement learning (RL) provides a genera… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  13. arXiv:2402.19432  [pdf, other

    cs.RO

    Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation

    Authors: Jonathan Yang, Catherine Glossop, Arjun Bhorkar, Dhruv Shah, Quan Vuong, Chelsea Finn, Dorsa Sadigh, Sergey Levine

    Abstract: Recent years in robotics and imitation learning have shown remarkable progress in training large-scale foundation models by leveraging data across a multitude of embodiments. The success of such policies might lead us to wonder: just how diverse can the robots in the training set be while still facilitating positive transfer? In this work, we study this question in the context of heterogeneous emb… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 16 pages, 9 figures

    MSC Class: 68T40 ACM Class: I.2.9

  14. arXiv:2402.17135  [pdf, other

    cs.LG cs.AI

    Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings

    Authors: Kevin Frans, Seohong Park, Pieter Abbeel, Sergey Levine

    Abstract: Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their sta… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  15. arXiv:2402.16359  [pdf, other

    cs.LG cs.AI q-bio.QM stat.ML

    Feedback Efficient Online Fine-Tuning of Diffusion Models

    Authors: Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Sergey Levine, Tommaso Biancalani

    Abstract: Diffusion models excel at modeling complex data distributions, including those of images, proteins, and small molecules. However, in many cases, our goal is to model parts of the distribution that maximize certain properties: for example, we may want to generate images with high aesthetic quality, or molecules with high bioactivity. It is natural to frame this as a reinforcement learning (RL) prob… ▽ More

    Submitted 27 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Under review (codes will be released soon)

  16. arXiv:2402.15567  [pdf, other

    cs.LG cs.AI cs.RO

    Foundation Policies with Hilbert Representations

    Authors: Seohong Park, Tobias Kreiman, Sergey Levine

    Abstract: Unsupervised and self-supervised objectives, such as next token prediction, have enabled pre-training generalist models from large amounts of unlabeled data. In reinforcement learning (RL), however, finding a truly general and scalable unsupervised pre-training objective for generalist policies from offline data remains a major open question. While a number of methods have been proposed to enable… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  17. arXiv:2402.15194  [pdf, other

    cs.LG cs.AI stat.ML

    Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control

    Authors: Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Tommaso Biancalani, Sergey Levine

    Abstract: Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins. While diffusion models are trained to represent the distribution in the training dataset, we often are more concerned with other properties, such as the aesthetic quality of the generated images or the functional properties of generated proteins. Diffusion models can be finetuned in a goal… ▽ More

    Submitted 28 February, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: Under review (codes will be released soon)

  18. arXiv:2402.07872  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

    Authors: Soroush Nasiriany, Fei Xia, Wenhao Yu, Ted Xiao, Jacky Liang, Ishita Dasgupta, Annie Xie, Danny Driess, Ayzaan Wahid, Zhuo Xu, Quan Vuong, Tingnan Zhang, Tsang-Wei Edward Lee, Kuang-Huei Lee, Peng Xu, Sean Kirmani, Yuke Zhu, Andy Zeng, Karol Hausman, Nicolas Heess, Chelsea Finn, Sergey Levine, Brian Ichter

    Abstract: Vision language models (VLMs) have shown impressive capabilities across a variety of tasks, from logical reasoning to visual understanding. This opens the door to richer interaction with the world, for example robotic control. However, VLMs produce only textual outputs, while robotic control and other spatial tasks require outputting continuous coordinates, actions, or trajectories. How can we ena… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  19. arXiv:2402.02651  [pdf, other

    cs.LG cs.AI cs.CV

    Vision-Language Models Provide Promptable Representations for Reinforcement Learning

    Authors: William Chen, Oier Mees, Aviral Kumar, Sergey Levine

    Abstract: Humans can quickly learn new behaviors by leveraging background world knowledge. In contrast, agents trained with reinforcement learning (RL) typically learn behaviors from scratch. We thus propose a novel approach that uses the vast amounts of general and indexable world knowledge encoded in vision-language models (VLMs) pre-trained on Internet-scale data for embodied RL. We initialize policies w… ▽ More

    Submitted 13 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  20. arXiv:2401.16889  [pdf, other

    cs.RO cs.AI eess.SY

    Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control

    Authors: Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath

    Abstract: This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal robots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. Our RL-based controller incorporates a n… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  21. arXiv:2401.16013  [pdf, other

    cs.RO cs.AI

    SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning

    Authors: Jianlan Luo, Zheyuan Hu, Charles Xu, You Liang Tan, Jacob Berg, Archit Sharma, Stefan Schaal, Chelsea Finn, Abhishek Gupta, Sergey Levine

    Abstract: In recent years, significant progress has been made in the field of robotic reinforcement learning (RL), enabling methods that handle complex image observations, train in the real world, and incorporate auxiliary data, such as demonstrations and prior experience. However, despite these advances, robotic RL remains hard to use. It is acknowledged among practitioners that the particular implementati… ▽ More

    Submitted 12 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: ICRA 2024

  22. arXiv:2401.12963  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents

    Authors: Michael Ahn, Debidatta Dwibedi, Chelsea Finn, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Karol Hausman, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Sean Kirmani, Isabel Leal, Edward Lee, Sergey Levine, Yao Lu, Isabel Leal, Sharath Maddineni, Kanishka Rao, Dorsa Sadigh, Pannag Sanketi, Pierre Sermanet, Quan Vuong, Stefan Welker, Fei Xia, Ted Xiao , et al. (3 additional authors not shown)

    Abstract: Foundation models that incorporate language, vision, and more recently actions have revolutionized the ability to harness internet scale data to reason about useful tasks. However, one of the key challenges of training embodied foundation models is the lack of data grounded in the physical world. In this paper, we propose AutoRT, a system that leverages existing foundation models to scale up the d… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 26 pages, 9 figures

  23. arXiv:2401.08553  [pdf, other

    cs.RO

    FMB: a Functional Manipulation Benchmark for Generalizable Robotic Learning

    Authors: Jianlan Luo, Charles Xu, Fangchen Liu, Liam Tan, Zipeng Lin, Jeffrey Wu, Pieter Abbeel, Sergey Levine

    Abstract: In this paper, we propose a real-world benchmark for studying robotic learning in the context of functional manipulation: a robot needs to accomplish complex long-horizon behaviors by composing individual manipulation skills in functionally relevant ways. The core design principles of our Functional Manipulation Benchmark (FMB) emphasize a harmonious balance between complexity and accessibility. T… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  24. arXiv:2401.05442  [pdf, other

    cs.LG cs.AI

    Functional Graphical Models: Structure Enables Offline Data-Driven Optimization

    Authors: Jakub Grudzien Kuba, Masatoshi Uehara, Pieter Abbeel, Sergey Levine

    Abstract: While machine learning models are typically trained to solve prediction problems, we might often want to use them for optimization problems. For example, given a dataset of proteins and their corresponding fluorescence levels, we might want to optimize for a new protein with the highest possible fluorescence. This kind of data-driven optimization (DDO) presents a range of challenges beyond those i… ▽ More

    Submitted 11 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  25. arXiv:2312.04474  [pdf, other

    cs.CL cs.AI cs.LG cs.RO

    Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

    Authors: Chengshu Li, Jacky Liang, Andy Zeng, Xinyun Chen, Karol Hausman, Dorsa Sadigh, Sergey Levine, Li Fei-Fei, Fei Xia, Brian Ichter

    Abstract: Code provides a general syntactic structure to build complex programs and perform precise computations when paired with a code interpreter - we hypothesize that language models (LMs) can leverage code-writing to improve Chain of Thought reasoning not only for logic and arithmetic tasks, but also for semantic ones (and in particular, those that are a mix of both). For example, consider prompting an… ▽ More

    Submitted 7 December, 2023; v1 submitted 7 December, 2023; originally announced December 2023.

  26. arXiv:2311.18232  [pdf, other

    cs.CL cs.AI cs.LG

    LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

    Authors: Marwa Abdulhai, Isadora White, Charlie Snell, Charles Sun, Joey Hong, Yuexiang Zhai, Kelvin Xu, Sergey Levine

    Abstract: Large language models (LLMs) provide excellent text-generation capabilities, but standard prompting and generation methods generally do not lead to intentional or goal-directed agents and might necessitate considerable prompt tuning. This becomes particularly apparent in multi-turn conversations: even the best current LLMs rarely ask clarifying questions, engage in explicit information gathering,… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  27. arXiv:2311.12996  [pdf, other

    cs.AI cs.RO

    RLIF: Interactive Imitation Learning as Reinforcement Learning

    Authors: Jianlan Luo, Perry Dong, Yuexiang Zhai, Yi Ma, Sergey Levine

    Abstract: Although reinforcement learning methods offer a powerful framework for automatic skill acquisition, for practical learning-based control problems in domains such as robotics, imitation learning often provides a more convenient and accessible alternative. In particular, an interactive imitation learning method such as DAgger, which queries a near-optimal expert to intervene online to collect correc… ▽ More

    Submitted 18 March, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: ICLR 2024

  28. arXiv:2311.05584  [pdf, other

    cs.LG cs.AI cs.CL

    Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations

    Authors: Joey Hong, Sergey Levine, Anca Dragan

    Abstract: Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks. However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome. For example, a teacher might try to understand their student's current comprehension level to tailor their instruction accordingly, and… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: 25 pages, 6 figures

  29. arXiv:2311.05067  [pdf, other

    cs.LG cs.AI stat.ML

    Accelerating Exploration with Unlabeled Prior Data

    Authors: Qiyang Li, Jason Zhang, Dibya Ghosh, Amy Zhang, Sergey Levine

    Abstract: Learning to solve tasks from a sparse reward signal is a major challenge for standard reinforcement learning (RL) algorithms. However, in the real world, agents rarely need to solve sparse reward tasks entirely from scratch. More often, we might possess prior experience to draw on that provides considerable guidance about which actions and outcomes are possible in the world, which we can use to ex… ▽ More

    Submitted 20 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: 25 pages, 16 figures, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  30. arXiv:2311.01059  [pdf, other

    cs.RO cs.LG

    Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment

    Authors: Annie S. Chen, Govind Chada, Laura Smith, Archit Sharma, Zipeng Fu, Sergey Levine, Chelsea Finn

    Abstract: To succeed in the real world, robots must cope with situations that differ from those seen during training. We study the problem of adapting on-the-fly to such novel scenarios during deployment, by drawing upon a diverse repertoire of previously learned behaviors. Our approach, RObust Autonomous Modulation (ROAM), introduces a mechanism based on the perceived value of pre-trained behaviors to sele… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 19 pages, 6 figures

  31. arXiv:2310.20663  [pdf, other

    cs.LG cs.AI

    Offline RL with Observation Histories: Analyzing and Improving Sample Complexity

    Authors: Joey Hong, Anca Dragan, Sergey Levine

    Abstract: Offline reinforcement learning (RL) can in principle synthesize more optimal behavior from a dataset consisting only of suboptimal trials. One way that this can happen is by "stitching" together the best parts of otherwise suboptimal trajectories that overlap on similar states, to create new behaviors where each individual state is in-distribution, but the overall returns are higher. However, in m… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: 21 pages, 4 figures

  32. arXiv:2310.17634  [pdf, other

    cs.RO cs.AI cs.LG

    Grow Your Limits: Continuous Improvement with Real-World RL for Robotic Locomotion

    Authors: Laura Smith, Yunhao Cao, Sergey Levine

    Abstract: Deep reinforcement learning (RL) can enable robots to autonomously acquire complex behaviors, such as legged locomotion. However, RL in the real world is complicated by constraints on efficiency, safety, and overall training stability, which limits its practical applicability. We present APRL, a policy regularization framework that modulates the robot's exploration over the course of training, str… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: First two authors contributed equally. Project website: https://sites.google.com/berkeley.edu/aprl

  33. arXiv:2310.11731  [pdf, other

    cs.AI

    Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning

    Authors: Jianlan Luo, Perry Dong, Jeffrey Wu, Aviral Kumar, Xinyang Geng, Sergey Levine

    Abstract: The offline reinforcement learning (RL) paradigm provides a general recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. While policy constraints, conservatism, and other methods for mitigating distributional shifts have made offline reinforcement learning more effective, the continuous action setting often necessitates various a… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  34. arXiv:2310.10639  [pdf, other

    cs.RO

    Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models

    Authors: Kevin Black, Mitsuhiko Nakamoto, Pranav Atreya, Homer Walke, Chelsea Finn, Aviral Kumar, Sergey Levine

    Abstract: If generalist robots are to operate in truly unstructured environments, they need to be able to recognize and reason about novel objects and scenarios. Such objects and scenarios might not be present in the robot's own training data. We propose SuSIE, a method that leverages an image-editing diffusion model to act as a high-level planner by proposing intermediate subgoals that a low-level controll… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 22 pages, 8 figures

  35. arXiv:2310.10103  [pdf, other

    cs.RO cs.AI cs.CL cs.LG

    Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning

    Authors: Dhruv Shah, Michael Equi, Blazej Osinski, Fei Xia, Brian Ichter, Sergey Levine

    Abstract: Navigation in unfamiliar environments presents a major challenge for robots: while mapping and planning techniques can be used to build up a representation of the world, quickly discovering a path to a desired goal in unfamiliar settings with such methods often requires lengthy mapping and exploration. Humans can rapidly navigate new environments, particularly indoor environments that are laid out… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Videos, code, and an interactive Colab notebook that runs in your browser https://sites.google.com/view/lfg-nav/

  36. arXiv:2310.10056  [pdf, other

    cs.LG

    Latent Conservative Objective Models for Data-Driven Crystal Structure Prediction

    Authors: Han Qi, Xinyang Geng, Stefano Rando, Iku Ohama, Aviral Kumar, Sergey Levine

    Abstract: In computational chemistry, crystal structure prediction (CSP) is an optimization problem that involves discovering the lowest energy stable crystal structure for a given chemical formula. This problem is challenging as it requires discovering globally optimal designs with the lowest energies on complex manifolds. One approach to tackle this problem involves building simulators based on density fu… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  37. arXiv:2310.08887  [pdf, other

    cs.LG cs.AI cs.RO

    METRA: Scalable Unsupervised RL with Metric-Aware Abstraction

    Authors: Seohong Park, Oleh Rybkin, Sergey Levine

    Abstract: Unsupervised pre-training strategies have proven to be highly effective in natural language processing and computer vision. Likewise, unsupervised reinforcement learning (RL) holds the promise of discovering a variety of potentially useful behaviors that can accelerate the learning of a wide array of downstream tasks. Previous unsupervised RL approaches have mainly focused on pure exploration and… ▽ More

    Submitted 9 March, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  38. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie, Anthony Brohan, Antonin Raffin , et al. (256 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 2 April, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  39. arXiv:2310.08558  [pdf, other

    cs.LG cs.AI cs.RO

    Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias

    Authors: Max Sobol Mark, Archit Sharma, Fahim Tajwar, Rafael Rafailov, Sergey Levine, Chelsea Finn

    Abstract: It is desirable for policies to optimistically explore new states and behaviors during online reinforcement learning (RL) or fine-tuning, especially when prior offline data does not provide enough state coverage. However, exploration bonuses can bias the learned policy, and our experiments find that naive, yet standard use of such bonuses can fail to recover a performant policy. Concurrently, pess… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  40. arXiv:2310.07896  [pdf, other

    cs.RO cs.CV cs.LG

    NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration

    Authors: Ajay Sridhar, Dhruv Shah, Catherine Glossop, Sergey Levine

    Abstract: Robotic learning for navigation in unfamiliar environments needs to provide policies for both task-oriented navigation (i.e., reaching a goal that the robot has located), and task-agnostic exploration (i.e., searching for a goal in a novel setting). Typically, these roles are handled by separate models, for example by using subgoal proposals, planning, or separate navigation strategies. In this pa… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Project page https://general-navigation-models.github.io/nomad/

  41. arXiv:2310.00873  [pdf, other

    cs.LG

    Deep Neural Networks Tend To Extrapolate Predictably

    Authors: Katie Kang, Amrith Setlur, Claire Tomlin, Sergey Levine

    Abstract: Conventional wisdom suggests that neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. Our work reassesses this assumption for neural networks with high-dimensional inputs. Rather than extrapolating in arbitrary ways, we observe that neural network predictions often tend towards a constant value as input data becomes increasingly O… ▽ More

    Submitted 15 March, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

  42. arXiv:2309.13041  [pdf, other

    cs.RO cs.CV cs.LG

    Robotic Offline RL from Internet Videos via Value-Function Pre-Training

    Authors: Chethan Bhateja, Derek Guo, Dibya Ghosh, Anikait Singh, Manan Tomar, Quan Vuong, Yevgen Chebotar, Sergey Levine, Aviral Kumar

    Abstract: Pre-training on Internet data has proven to be a key ingredient for broad generalization in many modern ML systems. What would it take to enable such capabilities in robotic reinforcement learning (RL)? Offline RL methods, which learn from datasets of robot experience, offer one way to leverage prior data into the robotic learning pipeline. However, these methods have a "type mismatch" with video… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: First three authors contributed equally

  43. arXiv:2309.10150  [pdf, other

    cs.RO cs.AI cs.LG

    Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

    Authors: Yevgen Chebotar, Quan Vuong, Alex Irpan, Karol Hausman, Fei Xia, Yao Lu, Aviral Kumar, Tianhe Yu, Alexander Herzog, Karl Pertsch, Keerthana Gopalakrishnan, Julian Ibarz, Ofir Nachum, Sumedh Sontakke, Grecia Salazar, Huong T Tran, Jodilyn Peralta, Clayton Tan, Deeksha Manjunath, Jaspiar Singht, Brianna Zitkovich, Tomas Jackson, Kanishka Rao, Chelsea Finn, Sergey Levine

    Abstract: In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizi… ▽ More

    Submitted 17 October, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: See website at https://qtransformer.github.io

  44. arXiv:2309.03839  [pdf, other

    cs.RO cs.HC cs.LG

    Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning

    Authors: Jensen Gao, Siddharth Reddy, Glen Berseth, Anca D. Dragan, Sergey Levine

    Abstract: Adaptive interfaces can help users perform sequential decision-making tasks like robotic teleoperation given noisy, high-dimensional command signals (e.g., from a brain-computer interface). Recent advances in human-in-the-loop machine learning enable such systems to improve by interacting with users, but tend to be limited by the amount of data that they can collect from individual users in practi… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2023

  45. arXiv:2309.03322  [pdf, other

    cs.LG cs.AI cs.RO

    REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation

    Authors: Zheyuan Hu, Aaron Rovinsky, Jianlan Luo, Vikash Kumar, Abhishek Gupta, Sergey Levine

    Abstract: Dexterous manipulation tasks involving contact-rich interactions pose a significant challenge for both model-based control systems and imitation learning algorithms. The complexity arises from the need for multi-fingered robotic hands to dynamically establish and break contacts, balance non-prehensile forces, and control large degrees of freedom. Reinforcement learning (RL) offers a promising appr… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: Accepted at CORL 2023. The first two authors contributed equally

  46. Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

    Authors: Taylor Sorensen, Liwei Jiang, Jena Hwang, Sydney Levine, Valentina Pyatkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, Yejin Choi

    Abstract: Human values are crucial to human decision-making. Value pluralism is the view that multiple correct values may be held in tension with one another (e.g., when considering lying to a friend to protect their feelings, how does one balance honesty with friendship?). As statistical learners, AI systems fit to averages by default, washing out these potentially irreducible value conflicts. To improve A… ▽ More

    Submitted 2 April, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: Proceedings of the AAAI Conference on Artificial Intelligence, 38

    Journal ref: Vol. 38 No. 18: AAAI-24 Technical Tracks 18; 2024; 19937-19947

  47. arXiv:2308.12952  [pdf, other

    cs.RO cs.LG

    BridgeData V2: A Dataset for Robot Learning at Scale

    Authors: Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, Sergey Levine

    Abstract: We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors designed to facilitate research on scalable robot learning. BridgeData V2 contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. BridgeData V2 provides extensive task and environment variability, leading to skills that can generalize across environments, domains,… ▽ More

    Submitted 17 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: 9 pages

  48. arXiv:2307.15818  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

    Authors: Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal , et al. (29 additional authors not shown)

    Abstract: We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web.… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: Website: https://robotics-transformer.github.io/

  49. arXiv:2307.13101  [pdf, other

    cs.LG cs.AI cs.RO

    Contrastive Example-Based Control

    Authors: Kyle Hatch, Benjamin Eysenbach, Rafael Rafailov, Tianhe Yu, Ruslan Salakhutdinov, Sergey Levine, Chelsea Finn

    Abstract: While many real-world problems that might benefit from reinforcement learning, these problems rarely fit into the MDP mold: interacting with the environment is often expensive and specifying reward functions is challenging. Motivated by these challenges, prior work has developed data-driven approaches that learn entirely from samples from the transition dynamics and examples of high-return states.… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: This is an updated version of a manuscript that originally appeared at L4DC 2023. The project website is here https://sites.google.com/view/laeo-rl

    Journal ref: Proceedings of The 5th Annual Learning for Dynamics and Control Conference, PMLR 211:155-169, 2023

  50. arXiv:2307.12968  [pdf, other

    cs.LG cs.AI

    A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning

    Authors: Benjamin Eysenbach, Matthieu Geist, Sergey Levine, Ruslan Salakhutdinov

    Abstract: As with any machine learning problem with limited data, effective offline RL algorithms require careful regularization to avoid overfitting. One-step methods perform regularization by doing just a single step of policy improvement, while critic regularization methods do many steps of policy improvement with a regularized objective. These methods appear distinct. One-step methods, such as advantage… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: Accepted to ICML 2023. Video (https://www.youtube.com/watch?v=1xlixIHZ0R4) and code (https://github.com/ben-eysenbach/ac-connection)