Mana:
Mana
Dexterous Manipulation of Articulated Tools
UC Berkeley1 · CMU2 · Stanford University3 · Amazon FAR4
* The last three authors contributed equally.
Mana (Manipulation Animator) is a sim-to-real system for learning dexterous manipulation of articulated tools.
Abstract
Articulated tool manipulation remains a major challenge in dexterous robotics due to the need to coordinate internal degrees of freedom and contact-rich interactions. While prior work has largely focused on rigid objects, articulated tool use remains underexplored because of its physical complexity and the difficulty of learning functional manipulation strategies. We present Mana (Manipulation Animator), a general sim-to-real framework that reformulates dexterous manipulation as an animation problem to address this challenge. Inspired by computer animation, Mana employs a coarse-to-fine pipeline that transforms procedurally-generated grasp keyframes into manipulation trajectories through motion planning and reinforcement learning. The data generation process is largely automatic, requiring only a few mouse clicks to specify functional affordances (less than 1 minute per tool). Across four articulated tools spanning different scales and joint types, Mana achieves zero-shot sim-to-real transfer for both grasping and in-hand manipulation, demonstrating a scalable approach to dexterous articulated tool use.
Physical Challenges
The problem is harder than we anticipated. Besides the precision required for tabletop functional grasping of a 1 cm-thick object, success is highly sensitive to the applied force. The fingers must apply precise contact force within the friction cone to actuate the tool stably. The force direction and magnitude must be continuously adjusted in response to tool motion to maintain balance. Furthermore, the hand and tool constitute a tightly coupled dynamic system, making control particularly challenging. Again, we find existing position-based teleoperation system inadequate to solve this force-sensitive problem.
Mana Data System
Learning functional articulated tool manipulation directly through reinforcement learning is challenging due to difficulties in both exploration and reward design. Inspired by Computer Animation, Mana takes a coarse-to-fine approach to generate dexterous manipulation data for policy learning. Given human affordance annotations, Mana scaffolds the entire tool manipulation process using numerous procedurally generated grasp keyframes, and then leverages motion planning (MP) and reinforcement learning (RL) to synthesize manipulation trajectories between these keyframes (i.e., inbetweening).
The enhanced Lightning Grasp system serves as the powerhouse of the entire data system. It directly reduces the long-horizon problem into simple subproblems, simplifying RL exploration. Besides, it improves the policy's robustness by generating diverse grasping configurations as training data, covering various contact modes the hand might encounter during manipulation.
Visuomotor Policy
We train a point-cloud-based visuomotor policy with trajectories produced by the Mana Data System. We use foundation perception models to extract tool point cloud, which substantially narrows the sim-to-real gap in visual observations.
Demonstrating Tool Use
All grasping and finger motions are autonomous (zero-shot sim-to-real). Wrist movement during tool-to-site alignment is handled by 6-DOF teleoperation. The object is initialized randomly on the tabletop.
Hardness of Teleoperation
Existing teleoperation systems expose only a noisy, position-based control interface. Yet articulated tool use demands precise, coordinated position-force control.
Failure Cases
Dexterous articulated tool use remains highly challenging. Different contact modes (even 5mm off) require distinct force (action) profiles, and even small errors in force magnitude or direction can lead to immediate failure. In our early iterations, we find that insufficient keyframe coverage leads to a brittle policy. Increasing state coverage with dense grasp keyframe sampling can help to improve robustness.
More on Grasping
Objects in, grasping trajectories out. Mana uses GPU-accelerated planning and simulation to generate diverse grasping trajectories starting from different initializations at scale, making the visuomotor policy insensitive to initial conditions.
Synthesizing tabletop grasps for thin objects and tools (around 1cm thickness) can be challenging. To address this, we introduce depenetration position iterations to the kinematics optimization in the Lightning Grasp system. The enhanced system can generate diverse tabletop grasps for flat objects.
Concluding Remarks:
An Illusion from Evolution
Articulated tool use is deceptively simple.
What appears effortless and trivial conceals a highly complex, reactive interplay of position and force.
In these quasi-static motions, stability relies on high-frequency closed-loop position-force control, continuously reacting to external perturbations and transitions in contact modes.
When control is correctly implemented, the hand remains smooth and unperturbed, as if nothing had happened.
Looking ahead, we hope advances in simulation, tactile integration, and system-level optimization can further improve robustness.
Equally important is combining learned manipulation skills with high-level spatial reasoning to enable robots to operate effectively in the real world.
We believe our proposed approach can also be adapted to train grasping and manipulation policies for other robotic manipulation problems.
In summary, we hope Mana and its subsystems can serve as a stepping stone toward more intelligent dexterous manipulation.