English | Japanese 

DCOB: Action Space for Motion Learning of Large DoF Robots

Reinforcement Learning (RL) methods enable a robot to acquire behaviors only from its objective expressed by a reward (objective) function. However, handling a large dimensional control input space (e.g. a humanoid robot) is still an open problem.

The aim of this research is to develop a suitable action space for RL methods with which the robot can learn good performance motions quickly.

We proposed a discrete action sets named DCOB. The DCOB stands for an action Directed to the Center Of a Basis function. It is generated from the given basis functions (BFs) for approximating a value function. Though the DCOB is a discrete set, it has an ability to acquire motions of high performance.

As an extension, WF-DCOB was proposed. It utilizes the wire-fitting to learn within a continuous action space which the DCOB discretizes. Thus, the WF-DCOB has a potential to acquire higher performance than the DCOB. But, so far, the performance of the acquired motions are almost the same, due to the instability of learning the wire-fitting.

Application to Motion Learning of Robots

Learning Jumping

This is an application of the DCOB to learning jumping of a humanoid robot on simulation. As an RL method, the Peng's Q(λ)-learning is used.

In the early stage of learning, the robot acts randomly because it learns from scratch (i.e. no prior knowledge).

After learning, the robot acquires a jumping motion.

Learning Crawling

The DCOB is applicable to the various motions. This is an example of learning crawling.

In the early stage of learning, the behavior of the robot is quite similar to the learning jumping case. This is because the robot also learns from scratch.

After learning, the robot acquires a crawling motion.

Crawling by a Real Robot

Only simulation? No! Our method is applicable to a real robot. We apply the DCOB to a crawling task of a real spider robot (Bioloid, made by ROBOTIS).

This is a learning phase. The robot also learns from scratch (without simulation!).

After learning, the robot is successful to acquire a crawling motion (after around 30 minutes).

Another viewpoint:

Video for AURO2013


DCOB is implemented in SkyAI. You can download and test it!

Related Papers