Automatic Reward Densification
A core challenge with scaling up Deep Reinforcement Learning (Deep RL) for use in robotic tasks of practical interest is the task specification problem, which typically manifests as the difficulty of reward design. In order to reduce the difficulty of reward function design in continuous robotics environments, we propose to develop a method that automatically densifies sparse, goal-based reward in robotic tasks such that the optimal policy is preserved by leveraging task plans.
We hypothesize that for many robotic tasks,
- while it is difficult for humans to specify a dense reward that cannot be hacked, it is easy to specify an abstract plannable model in PDDL that conveys information about the dynamics of the domain, and
- that valid abstract plans within this model can be leveraged to automatically densify sparse reward via potential-based reward shaping sufficiently enough for state-of-the-art RL approaches to solve these tasks.
We perform an extensive empirical evaluation of our system across different PDDL models with varying granularity, choices of potential function, choice of learning algorithm (PPO and SAC) and tasks.
Report | Presentation | Code
Environments
Reaching - PPO
Sparse Handcrafted |
Dense Handcrafted |
||||
Plan Based - Single Subgoal |
Plan Based - Multi Subgoal |
Plan Based - Grid Based |
|||
Time Varying - Single Subgoal |
Time Varying - Multi Subgoal |
Time Varying - Grid Based |
|||
Distance Varying - Single Subgoal |
Distance Varying - Multi Subgoal |
Distance Varying - Grid Based |
Reaching - SAC
Sparse Handcrafted |
Dense Handcrafted |
||||
Plan Based - Single Subgoal |
Plan Based - Multi Subgoal |
Plan Based - Grid Based |
|||
Time Varying - Single Subgoal |
Time Varying - Multi Subgoal |
Time Varying - Grid Based |
|||
Distance Varying - Single Subgoal |
Distance Varying - Multi Subgoal |
Distance Varying - Grid Based |
Pushing - PPO
Sparse Handcrafted |
Dense Handcrafted |
||||
Plan Based - Single Subgoal |
Plan Based - Multi Subgoal |
Plan Based - Grid Based |
|||
Time Varying - Single Subgoal |
Time Varying - Multi Subgoal |
Time Varying - Grid Based |
|||
Distance Varying - Single Subgoal |
Distance Varying - Multi Subgoal |
Distance Varying - Grid Based |
Pushing - SAC
Sparse Handcrafted |
Dense Handcrafted |
||||
Plan Based - Single Subgoal |
Plan Based - Multi Subgoal |
Plan Based - Grid Based |
|||
Time Varying - Single Subgoal |
Time Varying - Multi Subgoal |
Time Varying - Grid Based |
|||
Distance Varying - Single Subgoal |
Distance Varying - Multi Subgoal |
Distance Varying - Grid Based |
Maze-Reach - PPO
Sparse Handcrafted |
|||||
Distance Varying - Single Subgoal |
Distance Varying - Multi Subgoal |
Distance Varying - Grid Based |