Dexterous Manipulation from Locomotion Perspective

Younghyo Park

April 25, 2025

9 min read

Motivation
Differences at a glance
Details
Terrain Generations vs Object Datasets
Rewards
RL techniques: Curriculum Learning
Failure Recovery
Sim2Real - DR
Sim2Real - SysID
Show More

Motivation

Although it seems like a totally different class of work in robotics, Locomotion and Dexterous Manipulation in fact has some weird similarities — both work with a similar hardware (set of multiple identical kinematic chains) and the nature of the task itself (making selective contacts with an object to achieve the goal) is also similar in a bigger sense. If we attach conventional manipulators on top of this, we can again observe a similarity — quadruped + arm is in fact analogous to a manipulator + dexterous hand gripper.

Of course there are huge differences in detail, but still, it’s similar enough to make one wonder “What kind of lessons can we take from quadruped locomotion for dexterous manipulation, and even vice-versa?”

In fact, I’m already seeing some papers with a similar motivation — the one from Malik’s group that adapts the idea of RMA (Rapid Motor Adapatation) for dexterous manipulation, which was mainly a technique used for locomotion.

In-Hand Object Rotation via Rapid Motor Adaptation

Generalized in-hand manipulation has long been an unsolved challenge of robotics. As a small step towards this grand goal, we demonstrate how to design and l...

https://proceedings.mlr.press/v205/qi23a.html

This discussion aims to brainstorm together about the things that one can learn and adapt from one another — techniques from locomotion domain that can be applied to dexterous manipulation, and vice-versa.

I wrote down some initial thoughts and observations below.

Differences at a glance

ㅤ	Dexterous Manipulation	Locomotion
contact made with	diverse (graspable) objects	diverse terrains
diversity of “object/terrain” handled with	dataset of objects (e.g. parts of ShapeNet)	procedural process of random terrain generation
contact location	all parts of hand — finger tips, fingers, palm	usually only with their round feet
gravity direction	entire SO(3), especially when attached to arm	generally downwards, slightly changes when climing uphill/downhill
degrees of freedom	16 dof (4 x 4 for each fingers)*	12 dof (4 x 3 for each leg)
typical state inputs to the policy	depth + proprioceptive (visual information is crucial)	proprioceptive (mostly don’t require visual information)
Rewards during RL	tracking explicit spatial goals (e.g. SE(3) poses) + some auxiliary rewards	tracking time-derivatives (e.g. body velocity) + some auxiliary rewards
Sim2Real - DR	randomizes object friction, size, mass, joint control parameters, …	randomizes terrain, ground friction, restitution, body mass/com ..
Sim2Real - SysID	e.g. robot dynamics sysID with massively parallel simulator	e.g. RMA (online sysID/adaptation during test-time)
RL techniques	student-teacher learning	student-teacher learning, curriculum learning (e.g. increasing difficulties of terrain, tracking velocity)
failure recovery	-	often trains separate policy that can recover from fallen down states
other trends	tactile sensors being added to make the policy more robust	better constraint handling techniques other than reward shaping (one from KAIST) are gaining attention these days

Let’s further anaylze these differences in detail.

Details

Terrain Generations vs Object Datasets

As far as I know, locomotion policies are trained over randomized terrains that’s generated by some heuristic algorithms (e.g. using Perlin noise two generate 2d heightmaps, diamond square algorithm for fractal terrains). Basically, they rely on “terrain generation algorithms” rather than a fixed dataset of terrains.

summarization about terrain generation techniques from some paper ..

Generating a Terrain-Robustness Benchmark for Legged Locomotion: A...

Terrain-aware locomotion has become an emerging topic in legged robotics. However, it is hard to generate diverse, challenging, and realistic unstructured terrains in simulation, which limits the...

https://arxiv.org/abs/2208.07681

Terrains, in the domain of terrain-aware legged locomotion, are typically represented by heightmaps, i.e., twodimensional matrices of real numbers indicating the height at different points. A traditional method for terrain generation is to use Perlin noise [19], as is adopted by existing works [3] [7]. Although policies can be trained in simulation with such terrains, verifications must be done on real robots after sim-to-real transfer, because using Perlin noise does not lead to realistic heightmaps [9].

Alternative methods are to generate fractal terrains, e.g., to use the diamond square algorithm [20] and the fractal brownian motion algorithm [21]. However, it is difficult to regard them as realistic.

An emerging way to generate realistic terrains is to use GANs [22], where a discriminator tries to classify whether a sample comes from the dataset, and a generator tries to cheat the discriminator by generating samples from noises. Examples of GAN-based terrain generation are [23] and [24]. Yet, to achieve partially controllable generation and actively generate a dataset, we need interactive terrain authoring based on conditional GANs [10]. To be specific, the discriminator classifies whether the samples together with certain features are from the training dataset, and the generator generates fake samples from not only noises but also the features. Finally, the generator can generate realistic terrains from given input features, and the noises only affect smallscale details.

This approach relies to the hope that such synthesized terrains well reflects the diverse characteristics of real-world terrains (at least locally) so that the trained controller will well-behave in real-world. Major benefit of using terrain generation methods (with explicit parameters) is that we can control the difficulties of the task by adjusting parameter of terrain generation — which leads to the concept of curriculum learning as well.

On the other hand, dexterous manipulation relies on a fixed (but sufficiently large) training dataset of objects. This approach relies to the hope that the training dataset covers wide enough variety of objects so that the trained policy can be well generalized over unseen objects as well — which can sometimes be a bit unrealistic considering the vast range of objects. Also, it’s quite difficult to introduce a notion of “difficulty” in terms of object in this setting — which is also not really suited for curriculum learning setting as well.

It is thus natural, but still not straightforward, to think whether we can use the same approach of terrain generation in dexterous manipulation — using 3D shape generation methods during training, instead of fixed object datasets.

TC: it’s tempting to think about applying techniques used in locomotion to manipulation. but rather than think about the techniques themselves, might be better to start with what are we trying to achieve by transferring the techniques? why?

Rewards

As a great exemplary of both worlds, I took the formulations Visual Dexterity (Chen et al.) and Rapid Locomotion (Margolis et al.) to compare the rewardsterms . There were two observations:

Fixed direction of gravity helps the locomotion world a lot (Especially in a direction that always enforces meaningful contacts with the world). While the Visual Dexterity had to add some reward terms just to make the fingers “touch” the object, Rapid Locomotion did not have to do those — gravity were on their side.

TC: it highly depends on the robot morphology and tasks themselves.

YH:

Yeah, this gravity issue being task-dependent makes sense as well. Gravity direction in the vegetable peeling system, for instance, is helping the vegetable to at least stay at the palm.. But I guess the main point here was to point out that gravity can be applied in an adversarial manner for some dexterous manipulation scenarios 🙂

PA: same is true in locomotion if one needs to jump — gravity is adversarial or want to do some extreme parkour.

Locomotion world usually defines the task reward to track the time-derivatives, rather than to explicitly reach a far-away spatial goal. This was a natural choice for locomotion tasks, since it’s intuitive to command a quadruped to walk or run forward, rather than telling it to reach a certain point in xy space. This can be a better choice in a sense that it’s more of a “short-horizon” task than actually reaching a certain goal. For dexterous manipulation paper, things are quite different. It’s quite common to train a policy that is set to “reach” a certain goal, which can be quite long-horizon task.

TC: I don’t think tracking velocity is the only way to specify goal even in locomotion? there are also other papers do things differently. i think it highly depends on what one wants to achieve (the project goal/demo). if one cares about running fast, then tracking velocity makes sense. if one cares about commanding robots to go somewhere, then using a spatial position as the goal doesn’t sound like a terrible idea?

YH:

Yeah. I agree to your point, and Gabe Margolis

actually mentioned in the coment that there are actually papers that trains locomotion policies as a point-reaching task. I was just thinking whether training re-orientation policies with spatial velocity tracking task will make things easier, since SO(3)-reaching tasks can sometimes require longer-horizon manipulation than just simple velocity tracking.

there are papers doing that in manipulation. papers like tactile dexterity where people dont care about stablizing objects in the end

While it’s not really easy to overcome the adversarial gravity issue for dexterous manipulation tasks, it’s rather easy to try the idea of tracking the time-derivatives instead of reaching certain spatial goal. In fact, some of the papers actually tried this in a limited sense (training policies to rotate around certain axis, rather than to reach certain goal).

In addition, replacing the auxiliary rewards (which are basically just soft constraints) with other constraint enforcing techniques are being explored in the locomotion domain as well. This can be a natural extension to deploy in the dexterous manipulation world, also using a lot of auxiliary rewards in its formulation.

Not Only Rewards But Also Constraints: Applications on Legged...

Several earlier studies have shown impressive control performance in complex robotic systems by designing the controller using a neural network and training it with model-free reinforcement...

https://arxiv.org/abs/2308.12517

RL techniques: Curriculum Learning

One major training technique that allowed Rapid Locomotion (Margolis et al.) achieve its impressive robustness and speed was curriculum learning — gradually increasing the complexity of the task over the course of training. The challenge there was to design the right curriculum, giving appropriately difficult tasks at the right stage of training, i.e., Box-Adaptive/Grid-Adaptive curriculum.

Extending this idea to dexterous manipulation seems straightforward. Adapting the task reward of “velocity tracking” allows a natural extension of Rapid Locomotion’s curriculum learning technique. Applying curriculum learning for goal-reaching task reward might be a bit more sophisticated, but coming up with a nice curriculum learning strategy either way might be a nice addition to the dexterous manipulation world.

TC: in manipulation, people also use curriculum learning. whether one uses it or not depends on the tasks again. sometimes they don’t provide significant benefits, while other times it can be very beneficial

Although it wasn’t used in Rapid Locomotion paper, there are also line of works that runs curriculum learning over different terrains — using the “terrain generation” technique to control the difficulty of terrains accordingly over the course of training.

Assessing Evolutionary Terrain Generation Methods for Curriculum...

Curriculum learning allows complex tasks to be mastered via incremental progression over `stepping stone' goals towards a final desired behaviour. Typical implementations learn locomotion policies...

https://arxiv.org/abs/2203.15172

Guided Curriculum Learning for Walking Over Complex Terrain

Reliable bipedal walking over complex terrain is a challenging problem, using a curriculum can help learning. Curriculum learning is the idea of starting with an achievable version of a task and...

https://arxiv.org/abs/2010.03848

One that’s analogous to controlling the complexity of terrain over the course of training might be giving increasingly complex objects during training — which is not really straightforward. How do we define the complexity for an object? While object properties like friction, mass are quite straightforward to implement a curriculum over, creating 3D shapes with increasing complexity is not really simple. But there’s still some possibility here too — there are some impressive works in training generative models for 3D shapes these days.

Failure Recovery

Since a quadruped can do pretty much nothing when it’s flipped over (or in some different failure state) there are line of works that focuses on “failure recovery” behaviors in locomotion domain, one example including the one below:

Robust Recovery Motion Control for Quadrupedal Robots via Learned...

Quadrupedal robots have emerged as a cutting-edge platform for assisting humans, finding applications in tasks related to inspection and exploration in remote areas. Nevertheless, their floating...

https://arxiv.org/abs/2306.12712

This failure recovery system is a nice addition to the system, giving more autonomy in general, better dealing with corner cases. Dexterous manipulation world can also adapt a similar system as well — for instance, regrasping the object and trying reorientation again when the object drops over the course of actions.

P.S. This might be the only component that can be more easily implemented in dexterous manipulation world than locomotion world — grasping again and trying again might suffice as a nice failure recovery strategy.

Sim2Real - DR

Again, comparing Visual Dexterity and Rapid Locomotion — Domain Randomization were very similarly applied. Besides some extra randomizations done on state observations and control parameters in Visual Dexterity paper, the core components of DR were quite identical.

Sim2Real - SysID

System Identification (SysID) is a nice complement to Domain Randomization technique when it comes to Sim2Real issues. As clearly stated in the Visual Dexterity paper, extreme domain randomizations can lead the policy to be overly conservative, leading to sub-optimal performance. SysID can be a nice solution to this problem.

One of the SysID (+ online adaptation) techniques that locomotion domain often uses is a technique called RMA (Rapid Motor Adaptation). It abstracts various terrain properties in a form of latent vector during training, and tries to infer the terrain properties (in a form of latent) given the history of actions and observations.

RMA: Rapid Motor Adaptation for Legged Robots

Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear. This paper presents Rapid...

https://arxiv.org/abs/2107.04034

This technique adapts the usual form of teacher-student training leveraging previleged information, but adds a component where it explicitly conditions the policy with an implicit estimation of the system based on past interaction histories. It was proven to be effective in dealing with extremely diverse terrain scenarios.

Adapting this online SysID technique to dexterous manipulation is very straight-foward — and the paper below (also from Malik’s group) exactly did that.