A core challenge with scaling up Deep Reinforcement Learning (Deep RL) for use in robotic tasks of practical interest is the task specification problem, which typically manifests as the difficulty of reward design. In order to reduce the difficulty of reward function design in continuous robotics environments, we propose to develop a method that automatically densifies sparse, goal-based reward in robotic tasks such that the optimal policy is preserved by leveraging task plans.

We hypothesize that for many robotic tasks,

  • while it is difficult for humans to specify a dense reward that cannot be hacked, it is easy to specify an abstract plannable model in PDDL that conveys information about the dynamics of the domain, and
  • that valid abstract plans within this model can be leveraged to automatically densify sparse reward via potential-based reward shaping sufficiently enough for state-of-the-art RL approaches to solve these tasks.

We perform an extensive empirical evaluation of our system across different PDDL models with varying granularity, choices of potential function, choice of learning algorithm (PPO and SAC) and tasks.

Report | Presentation | Code


Environments

Reaching - PPO

Sparse Handcrafted

Dense Handcrafted


Plan Based - Single Subgoal


Plan Based - Multi Subgoal


Plan Based - Grid Based


Time Varying - Single Subgoal


Time Varying - Multi Subgoal


Time Varying - Grid Based


Distance Varying - Single Subgoal


Distance Varying - Multi Subgoal


Distance Varying - Grid Based


Reaching - SAC

Sparse Handcrafted

Dense Handcrafted


Plan Based - Single Subgoal


Plan Based - Multi Subgoal


Plan Based - Grid Based


Time Varying - Single Subgoal


Time Varying - Multi Subgoal


Time Varying - Grid Based


Distance Varying - Single Subgoal


Distance Varying - Multi Subgoal


Distance Varying - Grid Based


Pushing - PPO

Sparse Handcrafted

Dense Handcrafted


Plan Based - Single Subgoal


Plan Based - Multi Subgoal


Plan Based - Grid Based


Time Varying - Single Subgoal


Time Varying - Multi Subgoal


Time Varying - Grid Based


Distance Varying - Single Subgoal


Distance Varying - Multi Subgoal


Distance Varying - Grid Based


Pushing - SAC

Sparse Handcrafted

Dense Handcrafted


Plan Based - Single Subgoal


Plan Based - Multi Subgoal


Plan Based - Grid Based


Time Varying - Single Subgoal


Time Varying - Multi Subgoal


Time Varying - Grid Based


Distance Varying - Single Subgoal


Distance Varying - Multi Subgoal


Distance Varying - Grid Based


Maze-Reach - PPO

Sparse Handcrafted


Distance Varying - Single Subgoal


Distance Varying - Multi Subgoal


Distance Varying - Grid Based