Graph Search and Retrieval

Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills. Nevertheless, many real-world manipulation tasks involve precise and dexterous robot-object interactions, which make it difficult for humans to collect high-quality expert demonstrations. As a result, a robot has to learn skills from suboptimal demonstrations and unstructured interactions, which remains a key challenge. Existing works typically use offline deep reinforcement learning (RL) to solve this challenge, but in practice these algorithms are unstable and fragile due to the deadly triad issue. To overcome this problem, we propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval. We first use pretrained representation to organize the interaction experience into a graph and perform a graph search to calculate the values of different behaviors. Then, we apply a retrieval-based procedure to identify the best behavior (actions) on each state and use behavior cloning to learn that behavior. We evaluate our method in both simulation and real-world robotic manipulation tasks with complex visual inputs, covering various precise and dexterous manipulation skills with objects of different physical properties. GSR can achieve 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.

Problem

Method

Our method is based on weighted behavior cloning on the high level: we put more weights to the good behavior in the suboptimal dataset. We propose to use graph search and retrieval to implement this weighting procedure. We first organize the demonstration into a graph using frozen or finetuned pretrained representations. Then, we calculate the value of each transitions through a graph search. Finally, for each node (state) in the dataset, we retrieve its nearest neighbors, and give the good (high value) and relevant (similar) retrieved transitions more weights.

Real World Videos

We consider several robotic manipulation tasks involving fine-grained motions. We are only able to collect suboptimal human demonstrations for these tasks. We find that baseline methods can copy the suboptimal modes in the demonstration and get stuck in task execution. In contrast, our method is able to learn proficient behavior and avoid such kind of failures.

Offline Imitation Learning through
Graph Search and Retrieval

Abstract

Problem

Method

Real World Videos

BibTeX

Offline Imitation Learning through Graph Search and Retrieval

Abstract

Problem

Method

Real World Videos

BibTeX

Offline Imitation Learning through
Graph Search and Retrieval