Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills. Nevertheless, many real-world manipulation tasks involve precise and dexterous robot-object interactions, which make it difficult for humans to collect high-quality expert demonstrations. As a result, a robot has to learn skills from suboptimal demonstrations and unstructured interactions, which remains a key challenge. Existing works typically use offline deep reinforcement learning (RL) to solve this challenge, but in practice these algorithms are unstable and fragile due to the deadly triad issue. To overcome this problem, we propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval. We first use pretrained representation to organize the interaction experience into a graph and perform a graph search to calculate the values of different behaviors. Then, we apply a retrieval-based procedure to identify the best behavior (actions) on each state and use behavior cloning to learn that behavior. We evaluate our method in both simulation and real-world robotic manipulation tasks with complex visual inputs, covering various precise and dexterous manipulation skills with objects of different physical properties. GSR can achieve 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
How to learn from suboptimal demonstrations in challenging robotic manipulation tasks?
Many real world robotic manipulation tasks are very challenging for human to collect expert-level demos for imitation learning.
In the video, the gripper is teleoperated by a human to tie a band onto a pair of wheels. There are numerous failures in this process. Directly learning from this demonstration will lead to a poor control policy.
Video is in 5x speed.
Our method is based on weighted behavior cloning on the high level: we put more weights to the good behavior in the suboptimal dataset. We propose to use graph search and retrieval to implement this weighting procedure. We first organize the demonstration into a graph using frozen or finetuned pretrained representations. Then, we calculate the value of each transitions through a graph search. Finally, for each node (state) in the dataset, we retrieve its nearest neighbors, and give the good (high value) and relevant (similar) retrieved transitions more weights.
We consider several robotic manipulation tasks involving fine-grained motions. We are only able to collect suboptimal human demonstrations for these tasks. We find that baseline methods can copy the suboptimal modes in the demonstration and get stuck in task execution. In contrast, our method is able to learn proficient behavior and avoid such kind of failures.
Best Baseline (IQL)
Ours
@article{yin2024offline,
title = {Offline Imitation Learning through Graph Search and Retrieval},
author = {Yin, Zhao-Heng and Abbeel, Pieter},
journal = {Robotics: Science and Systems},
year = {2024},
}