Robot Learning of Mobile Manipulation with Reachability Behavior Priors

Authors: Snehal Jauhri, Jan Peters and Georgia Chalvatzaki


Mobile Manipulation (MM) systems are ideal candidates for taking up the role of a personal assistant in unstructured real-world environments. Among other challenges, MM requires effective coordination of the robot’s embodiments for executing tasks that require both mobility and manipulation. Reinforcement Learning (RL) holds the promise of endowing robots with adaptive behaviors, but most methods require prohibitively large amounts of data for learning a useful control policy. In this work, we study the integration of robotic reachability priors in RL methods for accelerating the learning of MM. Namely, we consider the problem of optimal base placement and the subsequent decision of whether to activate the arm for reaching a 6D target. We derive Boosted Hybrid RL (BHyRL), a novel actor-critic algorithm that benefits from modeling Q-functions as a sum of residual approximators. Every time a new task needs to be learned, we can transfer our learned residuals and learn the component of the Q-function that is task-specific, hence, maintaining the task structure from prior behaviors.

Our Contributions

  • We propose to use a hybrid action-space Reinforcement Learning algorithm for effectively tackling the need for discrete and continuous action decisions in Mobile Manipulation
  • We learn a reachability behavioral prior for Mobile Manipulation that can speed up the learning process, and incentivize the agent to select kinematically reachable base poses when dealing with 6D reaching and fetching tasks
  • We propose a new algorithm: Boosted Hybrid RL (BHyRL) for transferring knowledge from behavior priors by modelling Q-functions as sums of residuals, while also regularizing the policy learning in a trust-region fashion

Boosted Hybrid RL

  • The concept of ‘boosting’ [1] is to combine many weak learners to create a single strong learner.
  • To learn challenging base placement tasks, we first learn simpler reachability tasks and use the learnt behavior as a prior for accelerating the learning of subsequent tasks.
  • To do this, the Q-function of every task is modelled as the sum of residuals learned on previous tasks [2],[3].
  • Thus, we can progressively learn more difficult tasks while retaining the information and structure provided by the prior Q values.
  • Additionally, we regularize the new task policy using a KL-divergence penalty with the previous policy.

Simulated tasks for 6D Reaching & Fetching

The agent learns progressively more challenging tasks and combines each of the learned behaviors:

6D_Reach_1m task

The agent needs to reach a 6D target in its vicinity (1-metre radius) by choosing an optimal base location and activating its arm for reaching

6D_Reach_5m task

The agent needs to navigate towards a 6D target that is up to 5 metres away. The 6D_Reach_1m behavior is used as a prior.

6D_Reach_3_obst task

Similar to the task above but now in the presence of 3 obstacles. The 6D_Reach_1m and 6D_Reach_5m behaviors are used as priors.

6D_Fetch_table/wall task

The agent needs to fetch an object placed on a table in the presence of a wall behind the table. The 6D_Reach_1m and 6D_Reach_5m behaviors are used as priors.

6D_Fetch_2_furniture task

The agent needs to fetch an object placed on a table in the presence of another furniture obstacle. The 6D_Reach_1m, 6D_Reach_5m and 6D_Reach_3_obst behaviors are used as priors.

6D_Fetch_multiobj task

The agent needs to fetch an object placed on a table without colliding with multiple other objects on the table. The 6D_Reach_1m and 6D_Reach_5m behaviors are used as priors.

Demonstration of zero-shot transfer of BHyRL policy

“Will this method work for my robot?”

Our training method can also work for other mobile manipulators such as the above ‘Fetch’ robot. Our codebase will be released at along with guidelines on how to train your own robot.


Snehal Jauhri

Jan Peters

Georgia Chalvatzaki


This research received funding from the German Research Foundation (DFG) Emmy Noether Programme (#448644653) and the RoboTrust project of the Centre Responsible Digitality Hessen, Germany.


  1. Yoav Freund, “Boosting a weak learning algorithm by majority”, Information and Computation, 121 (2):256–285, 1995.
  2. Samuele Tosatto, Matteo Pirotta, Carlo D’Eramo, and Marcello Restelli, “Boosted fitted q-iteration”, International Conference on Machine Learning, 2017.
  3. P. Klink, C. D’Eramo, J. Peters, and J. Pajarinen, “Boosted curriculum reinforcement learning,” in ICLR, 2022.
%d bloggers like this: