Geometric Retargeting

A Principled, Ultrafast Hand Retargeting Algorithm

Zhao-Heng Yin1,2, Changhao Wang2, Luis Pineda2, Krishna Bodduluri2,
Tingfan Wu2, Pieter Abbeel1, and Mustafa Mukadam2.       

1BAIR, Berkeley EECS and 2FAIR at Meta

Smooth and Expressive Retargeting

Kinematic retargeting is central to teleoperation. Geometric Retargeting, the unsung hero behind our Dexterity Gen, answers a fundamental question: What makes good retargeting?


Retargeting as a Shape Correspondence Problem

Human fingertips and robot fingertips have significantly different ranges of motion, as illustrated in the comparison below. Retargeting presents a challenge in establishing a non-linear shape correspondence—specifically, how can the desired retargeting or correspondence be accurately defined?

dxg

Make Retargeting Principled

We introduce several geometric retargeting principles to optimize human experience: Motion Preservation, Robot Utility Maximization, High Flatness Control, Pinch Correspondence, and Collision Minimization. These principles are independent of each other and form a minimal basis of a good retargeting function.

Explore them in the gallery below.

motion

Principle I: Motion Preservation

When the human fingertip moves in a certain direction, the robot fingertip should do the same.

motion

Principle II: Utility Maximization

The retargeting should be a surjection. Any robot fingertip configuration should be realizable. We implement this as a Chamfer loss.

motion

Principle III: High Flatness Control

The control sensitivity should be nearly uniform everywhere.

motion

Principle IV: Pinch Correspondence

Pinch grasp is a critical event. When a pair of human fingers pinch, the robot hand should replicate the same motion.

motion

Principle V: Collision Minimization

The retargeting function should not lead to collision.


Make Retargeting Fast

We implement our retargeting model as a neural network and we implement principles above as differentiable cost functions to train it, see the figure.
-- It is superfast, across stages: training takes 5-10 minutes; Inference at 1kHz.

dxg

GeoRT Understands You Better

GeoRT has high expressiveness and follows human intention well. It can also achieve many useful hand poses for in-hand manipulation easily.

These advantages give human operators a greater sense of agency, allowing them to maintain better control over grasping.


Conclusion

Retargeting is at the core of human-driven, cross-embodiment robot learning. In our experiments, we notice that even the smallest change in retargeting can make a big difference. We hope that our GeoRT algorithm can inspire new ideas and designs in the future human-in-the-loop learning systems.