School of Computing. Dublin City University.
Online coding site: Ancient Brain
coders JavaScript worlds
In general, Reinforcement Learning work has concentrated on problems with a single goal. As the complexity of problems scales up, both the size of the statespace and the complexity of the reward function increase. We will clearly be interested in methods of breaking problems up into subproblems which can work with smaller statespaces and simpler reward functions, and then having some method of combining the subproblems to solve the main task.
Most of the work in RL either designs the decomposition by hand [Moore, 1990], or deals with problems where the sub-tasks have termination conditions and combine sequentially to solve the main problem [Singh, 1992, Tham and Prager, 1994].
The Action Selection problem essentially concerns subtasks acting in parallel, and interrupting each other rather than running to completion. Typically, each subtask can only ever be partially satisfied [Maes, 1989].
Lin has devised a form of multi-module RL suitable for such problems, and this will be the second method tested below.
Lin [Lin, 1993] suggests breaking up a complex problem into sub-problems, having a collection of Q-learning agents learn the sub-problems, and then have a single controlling Q-learning agent which learns Q(x,i), where i is which agent to choose in state x. This is clearly an easier function to learn than Q(x,a), since the sub-agents have already learnt sensible actions. When the creature observes state x, each agent suggests an action . The switch chooses a winner k and executes .
Lin concentrates on problems where subtasks combine to solve a global task, but one may equally apply the architecture to problems where the sub-agents simply compete and interfere with each other, that is, to classic action selection problems.
Return to Contents page.