Dr. Mark Humphrys

School of Computing. Dublin City University.

Online coding site: Ancient Brain

coders   JavaScript worlds

Search:

Coding the state-space as a lookup-table

How do we actually code the state-space? In particular how do we encode the Q(x,a) data structure? We want a data structure that can take as input (i.e. be indexed by) two vectors x and a, both members of finite sets, and return a real number output Q(x,a).

The basic idea is that because x and a are members of finite sets, we can give each of them a unique ID number, and so we can give the (x,a) combination a unique ID too. Then we just have an array of some size N of real numbers, indexed by this ID.

x = (x1, x2, .. xn)
where each xi takes one of some finite set of values v0, v1, ... vm (in fact, the set of values, and the number of them, could be different for each dimension i).
e.g. x = (x1, .. x10) where for example x7 takes values -100, 0, or 100.

To help us enumerate the states so we can construct an ID number, we simply constrain the set of values to run from 0 to m (we can translate to/from the real values when we use them). So above we say x7 takes values 0,1,2.


Lookup tables and Generalisations

If x and a were not members of finite sets we would need to use a generalisation like a neural network.

Even with large finite, we may need generalisation.

To reduce the statespace size, we may need to be careful with our definition of state. Note that we can easily define too many possibilities, so that a huge chunk of the logical state space may not actually exist, e.g. all blank in chess board, or all kings. Could even have 90 percent of the lookup table unused.

Often we might make the world discrete and finite by dividing up the continuous input into a small number of coarse-grained divisions. But remember this in itself is a generalisation.



Example

Consider how we could enumerate the finite set of states defined as follows:

x = (x1, x2, x3)
where x1 takes values 0,1,
x2 takes values 0,1,2,
and x3 takes values 0,1.
We can sum this up by just having a set of variables saying how many values each dimension takes:
c1=2, c2=3, c3=2.

We can enumerate as follows:

x = (0,0,0)  id = 0
     0 0 1        1
     0 1 0        2
     0 1 1        3
     0 2 0        4
     0 2 1        5
     1 0 0        6
     1 0 1        7
     1 1 0        8
     1 1 1        9
     1 2 0        10
     1 2 1        11
Note that it is not binary.

Q. When is it binary?

Consider the last state (1,2,1).
The "1" means we have done the "0"'s in the first column, i.e. we have done 1.c2.c3,
plus the "2" means that within the "1"s we have already done an additional 2.c3,
plus the last "1" means we have done an additional 1.
So id(1,2,1) = 1.c2.c3 + 2.c3 + 1

In general, we can enumerate a state as follows:

int id ( state x )
{
    return x1c2c3 + x2c3 + x3
}




Enumeration scheme

States

Where x = (x1, x2, .. xn),
id(x) = x1c2..cn   + x2c3..cn   + ...   + x(n-1)cn   + xn

The number of possible states is:
nostates = c1c2...cn

The state IDs run from: 0 .. (nostates-1)


Q. The first state in the enumeration is:
( 0, 0, ..., 0 )
Show using the rule above that its ID is:
0

Q. The last state in the enumeration is:
( c1-1, c2-1, ..., cn-1 )
Show using the rule above that its ID is:
(c1c2...cn) - 1




Exercise - Write the state2id and the id2state functions.



Actions

Similar scheme. Action IDs run from: 0 .. (noactions-1)




States x Actions

(x,a) pairs can be enumerated:

(0,0)                              id = 0 + 0
(0,1)                                   0 + 1
...                                      ...
(0,noactions-1)                         0 + noactions-1
(1,0)                         1.noactions + 0
...                                      ...
(1,noactions-1)               1.noactions + noactions-1
(2,0)                         2.noactions + 0
...                                      ...
(nostates-1,noactions-1)     

 
The ID of an (x,a) pair is:
(id(x) * noactions) + id(a)

The number of (x,a) combinations is:
nostates*noactions

IDs run from:
0 .. (nostates*noactions)-1




Pseudo-code

So our function to access the State Space is:

real StateSpace :: at ( vector x, vector a )
{
    int id = (id(x) * noactions) + id(a)
    return ActualArray [ id ]
}

where ActualArray is simply an ordinary 1-dimensional array of real numbers:

real ActualArray [ nostates * noactions ]
Then we have one of these data structures to hold the Q-values:

StateSpace Q

and we can just use it like:

Q.at(x,a) = ((1-ALPHA) * Q.at(x,a)) + (ALPHA * new-estimate)

In the sample code you will see that I enumerated Actions x States rather than States x Actions as above.





ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.      New 250 G VPS server.

Note: Links on this site to user-generated content like Wikipedia are highlighted in red as possibly unreliable. My view is that such links are highly useful but flawed.