Dr. Mark Humphrys

School of Computing. Dublin City University.

Online coding site: Ancient Brain

coders   JavaScript worlds

Search:

Free AI exercises


WWM - Part 2 - Part 3




Part 3 - Implementation


8. Implementation

Having defined in the first part of this paper how a WWM scheme could work, and what it would be good for, we shall now move towards an actual implementation. First we need to define what the standard mode of operation will be. Will clients connect to servers for long periods of time? Or will they connect, carry out some transaction, and then disconnect? First note the differences with the Web:

Short, limited-length, client-server transactions

One way of breaking this indefinite loop down might be to treat each individual interaction with a server as a short, non-looping process. The server responds to a short query with a response that will return within a limited time. The server does not know when, if ever, it will receive the next query. Other algorithms implement loops and more complex logic repeatedly using these primitive server queries.

Client algorithm

The client software, which is driving a single top-level Mind and World, will implement a program something along these lines:

  1. For each server:
    • Connect to server - Start request - Tell server to start a new run for this client - Receive a unique run ID, so that you can identify yourself later - End request

  2. Repeat:
    1. Connect to World server - Start request - Send run ID to identify yourself - Query state - Get state   x   - End request
    2. Connect to Mind server - Start request - Send run ID to identify yourself - Send state   x   - Get action   a   - End request
    3. Connect to World server - Start request - Send run ID to identify yourself - Send action   a   - Get new state   y   - End request
    4. Connect to Mind server - Start request - Send run ID to identify yourself - Tell it new state   y   - Receive confirmation - End request

  3. For each server:
    • Connect to server - Start request - Send run ID to identify yourself - Send "End run" command - Receive confirmation - End request

To clarify, the non-technical client user will use the servers through standardised client software. It is the client software that will implement this overall control algorithm. The non-technical client user will see none of this.

The server may be involved in many runs

The unique run ID is because the server may be simultaneously involved in many other runs with other clients. The server must keep the details of each run separate from each other. [Paulos and Canny, 1996] use a unique run ID like this in Internet robotics, where a World is servicing multiple Minds. Here also the Mind server may be involved in other Societies running at the same time. So the World-Wide-Mind is even stranger than having bits of your mind distributed around the world. It means that bits of your mind are simultaneously running in other minds.

The client controls time and may implement time-outs

The Mind server does not talk to the World server directly. Rather, the servers respond to short, finite-length requests. The client algorithm controls how many such requests occur. It is up to the client how long it runs the above loop for. The client may also implement a time-out. If the Mind takes too long to reply, the client could:

  1. Abort the Mind request, query the World again to make sure the state is up to date, then query the Mind again.

    or:

  2. Wait for the Mind to reply, so we know it is back online again. Then ignore its reply, query the World to get up to date, and then query the Mind again.

Similarly, if the World is down, the client may wait until it is back up, and then requery the Mind with the new state, instead of blindly executing the old, unexecuted action. This scheme could allow for a variable use of time, where the client may take days to come back to each server with the next request.

This is not a stimulus-response model

Continuing the discussion about time, it is important to reiterate that the above does not define the Mind as a stimulus-response machine. The Mind is simply receiving a periodic update about the state of the World. The Mind may run according to a different clock to the World:

  1. If the World changes slowly then a large number of   x   in a row may be the same. In this case the Mind is receiving more updates than it needs, and if the model demands that it return an action in response to each of these updates, then we will want to define as one of our actions an action for "Do nothing".

  2. Alernatively, just because the current state   x   is the parameter that is sent along with the "Get action" query, does not mean that the action returned is a function of   x   alone. The Mind may suddenly start taking actions even though   x   has not changed. The Mind can be remembering all previous states, and making its judgement based on that knowledge. It can be building a world model. It can have internal clocks that cause it to change plans according to time-based action selection [Ring, 1992, Blumberg, 1994, Whitehead et al., 1993, McFarland, 1989]. It can be learning, starting with random actions, and changing its policy as it goes along. It can be engaged in long-term or short-term planning. It can be symbolic or non-symbolic.

MindAS server algorithm

The MindAS server responds to queries like any other server. Inside its "Get action" query is some complex logic interrogating a list of Mind servers to find a winning action. This may be a loop but, unlike the client, it will be a finite-length loop, not an indefinite-length loop.

Similarly, the MindAS server will receive an "Inform it about new state   y"   command after each action is executed. Inside this command it will send an "Inform it about new state" command to all of its subsidiary Mind servers, along with extra information that only the MindAS server knows, such as whether they were obeyed or not.

The MindAS server may also implement time-outs

The MindAS server may implement time-outs. It is periodically sending a query to all Mind servers. It cannot afford to go round them in rotation. It cannot afford to wait until Mind server M1 has returned before sending a request to M2. Instead it must send requests to all of them in parallel, and receive the replies as they come in. With multiple Mind servers there is a much greater chance of some being slow, offline, or even gone altogether (broken links). Any sensible MindAS server will implement a time-out. If some of the Mind servers do not respond within the time-out, it will make a decision based on whatever actions have come in.

The servers (and client software) may implement any general-purpose algorithm using the server queries

Clearly the above client algorithm pseudo-code could be made more efficient. Perhaps there is no need to connect to the Mind after each action - it can get to find out what happened (what the new state is) next time around the loop when it is asked for a new action. And next time round there is no need to query the World again - we already know what the new state is from when we executed the action. But the point is that it is up to the client to write this program. It will not be laid down in the protocol. Similarly, a server may implement any algorithm it likes provided it responds to the set of queries expected of it.




9. List of server queries

The definition of the WWM comes down to this, the definition of the possible queries and responses of the servers. The client software may implement any general-purpose algorithm based on these queries and responses. The Mind servers, the World servers, the MindAS servers, the MindM servers and the WorldW servers may each implement any general-purpose algorithm based on these queries and responses, provided that they themselves respond to these queries.


World server

Request name Argument data Return data
New run
  1. (OPTIONAL) Client URL (client may be a real client, or another WorldW server)
  2. (OPTIONAL) Open-ended set of arguments whose format is interpreted by the World server. These are parameters for the world, e.g.:
    1. synchronous or asynchronous
    2. shared or non-shared
    3. start new world or join existing world
    4. virtual world size
    5. parameters defining number and type of other objects in virtual world
  1. Confirm (robot now in use, new copy of virtual world set up for this client, new actor in existing virtual world set up, etc.)
  2. world run ID
  3. (OPTIONAL) world display URL

  or  

  1. Refusal (client blocked, client URL not valid, failure in following trail of credit from client URL, payment or authentication required, robot already in use, bad parameters)
Get display URL
  1. world run ID
  1. (OPTIONAL) world display URL
"No operation" (Possibly used as a periodic clock timer, or just to inform the server that the client is still running.)
  1. world run ID
  1. Confirm
Get state
  1. world run ID
  1. x
Execute action
  1. world run ID
  2. a
  1. y
  2. (OPTIONAL) Score (points scored by this single action, according to some scoring system at the World server).

    This score could be used to drive automated searches with no user interface.

Reset (Reset the world as it would be at the start of a run. e.g. We are trying to solve a problem. Previously we were just learning. Now we want to test our knowledge.)
  1. world run ID
  1. Confirm (This may or may not reset the score.)
Reset score
  1. world run ID
  1. Confirm
Get current score
  1. world run ID
  1. (OPTIONAL) Score (total points scored so far in this run).
End run
  1. world run ID
  1. Confirm (Display URL is removed, robot is freed for other use, etc.)
  2. (OPTIONAL) Score (total points scored over course of run).

Notes:

  1. The unique world run ID is known only to the client, and used to identify itself each time when it returns with a new query.
  2. The state and the action are indefinite-length, undefined streams of plain text (terminated by   </data>)   to be interpreted by the servers.
  3. The World should probably support an action for "Do nothing".

A WorldW server has the same interface as a World server.


Mind server

Request name Argument data Return data
New run
  1. (OPTIONAL) Client URL (client may be a real client, or another MindM server)
  2. (OPTIONAL) world display URL
  3. (OPTIONAL) world run ID
  4. (OPTIONAL) Open-ended set of arguments whose format is interpreted by the Mind server. These are parameters for the mind, e.g.:
    1. maximum allowable timeout before mind must return an action
  1. Confirm
  2. mind run ID
  3. (OPTIONAL) mind display URL (Mind may display at some URL information about how it is being used on this run, what it has learnt, etc.)

  or  

  1. Refusal (client blocked, client URL not valid, failure in following trail of credit from client URL, world display URL not valid, failure in following trail of credit from world display URL, payment or authentication required, bad parameters)
Get display URL
  1. mind run ID
  1. (OPTIONAL) mind display URL
"No operation"
  1. mind run ID
  1. Confirm
Ready to suggest action?
  1. mind run ID
  2. (OPTIONAL) current state x
  1. Ready to suggest an action.

  or  

  1. Cannot suggest an action at this time or in this state (e.g. has terminated, or is waiting for pre-conditions to be met).
Get action
  1. mind run ID
  2. x
  1. a
  2. (OPTIONAL) Q (predicted points that will be scored by this action, see below)

  or  

  1. Cannot suggest action at this time or in this state.

Note that the action "Do nothing" is not equivalent to "Cannot suggest action".

Inform it about state (Inform it what happened when the action was executed)
  1. mind run ID
  2. y
  3. (OPTIONAL) Score (points scored by action according to World)
  1. Confirm
  2. (OPTIONAL) Q (points scored by this action, according to the Mind's way of scoring points, which might be different to how the World sees it).
Reset score
  1. mind run ID
  1. Confirm
Get current score
  1. mind run ID
  2. (OPTIONAL) Score (points scored so far in run according to World).
  1. (OPTIONAL) Score (points scored so far in run according to Mind).
End run
  1. mind run ID
  2. (OPTIONAL) Score (total points scored in run according to World)
  1. Confirm
  2. (OPTIONAL) Score (total points scored in run according to Mind)


Notes:
  1. A Mind server may call other Mind servers, thus setting up its own run with them, and its own run ID. Presumably Minds will start other Minds with progressively smaller "maximum allowable timeout" parameters.


Additional MindL queries

A MindL server is a Mind server that learns, and supports the following additional queries:

Request name Argument data Return data
New run - MindL arguments
  1. (OPTIONAL) Open-ended set of arguments whose format is interpreted by the Mind server. These are parameters for the mind, e.g.:
    1. Numeric values of a set of rewards from which the mind will now start learning
    2. Q-Temperature
    3. Proposed length of Q learning run (for use in declining Q-Temperature)
    4. discounting factor (in Reinforcement Learning)
    Parameters for a neural network mind might include:
    1. no. of hidden units
    2. learning rate
  2. (OPTIONAL) Other Client URL. Say the server stores its learnt knowledge relative to each client. This parameter is to say "Give me the version of the Mind you have constructed by learning with the client at this URL."
  1. Confirm

  or  

  1. Refusal (bad parameters, other client URL not valid, no data saved relative to other client URL)
Get Q-Temperature
  1. mind run ID
  1. (OPTIONAL) Q-Temperature
Reset Q-Temperature / World has changed
  1. mind run ID
  2. (OPTIONAL) Proposed length of next Q learning run
  1. Confirm (will reset Q-Temperature to something sensible)
Send explicit Q-Temperature
  1. mind run ID
  2. Q-Temperature
  3. (OPTIONAL) Proposed length of next Q learning run
  1. Confirm
Get action
  1. mind run ID
  2. x
  3. (OPTIONAL) Open-ended set of arguments whose format is interpreted by the Mind server. These are parameters for this step only, e.g.:
    1. Q-Temperature
  1. a
  2. (OPTIONAL) Q (predicted points that will be scored by this action)

  or  

  1. Cannot suggest action at this time or in this state.


Additional Mindi queries

A Mindi server is a Mind server that accepts it may not be the only mind in the body, and supports the following additional queries:

Request name Argument data Return data
New run - Mindi arguments
  1. (OPTIONAL) Open-ended set of arguments whose format is interpreted by the Mind server. These are parameters for the mind, e.g.:
    1. mind "strength"
    2. W-Temperature
    3. Proposed length of W learning run
  1. Confirm

  or  

  1. Refusal (bad parameters)
Get mind strength
  1. mind run ID
  1. (OPTIONAL) mind strength
Change mind strength
(Could ask for a new instance of the mind with a different strength, or there might be some reason to keep the current instance and change its strength)
  1. mind run ID
  2. mind strength
  1. Confirm
Get W-Temperature
  1. mind run ID
  1. (OPTIONAL) W-Temperature
Reset W-Temperature / Collection has changed (new competing Minds, or old competing Minds have gone)
  1. mind run ID
  2. (OPTIONAL) Proposed length of next W learning run
  1. Confirm (will reset W-Temperature to something sensible)
Send explicit W-Temperature
  1. mind run ID
  2. W-Temperature
  3. (OPTIONAL) Proposed length of next W learning run
  1. Confirm
Get suggested action with values
  1. mind run ID
  2. x
  3. (OPTIONAL) Open-ended set of arguments whose format is interpreted by the Mind server. These are parameters for this step only, e.g.:
    1. Q-Temperature
    2. W-Temperature
  1. (OPTIONAL) a (my suggested action)
  2. (OPTIONAL) Mind server URL (of the Mind I want to call now)
  3. Q (a measure of how good this action is - how much this action will benefit me).
  4. W (how much I am prepared to pay to win this competition). i.e. We must have some idea what will happen if we don't win.

  or  

  1. Cannot suggest action at this time or in this state.

The Mindi server must return either an action, or the URL of a server that will generate the action, or a "Cannot suggest action" message.

Get values for this action (How good/bad is this action)
  1. mind run ID
  2. x
  3. a
  4. (OPTIONAL) Open-ended set of arguments whose format is interpreted by the Mind server. These are parameters for this step only, e.g.:
    1. Q-Temperature
    2. W-Temperature
  1. Q (How good is this action - How much will this action gain for you)
  2. W (How bad is this action - How much would you pay to stop this and execute your best action instead. How much do you lose by having this executed instead of your best action.)

It is possible for an action to have high Q and high W.

Inform it about winner (Inform it what happened)
  1. mind run ID
  2. (OPTIONAL) boolean (whether it was obeyed). Why this might be optional: A Mind server in Hierarchical Q-learning was never even asked for an action, so we can't say it was or wasn't obeyed. But we still want to tell it that we took someone else's action ak and got to state y.
  3. (OPTIONAL) W (payment to the Mind for losing - see Economy of Mind)
  4. (OPTIONAL) Who won (to be precise, the Mind server URL of the winner). For use in Nested systems.
  5. (OPTIONAL) ak (the action that was executed). Why this might be optional: If it did not win, it may not understand the action that did. But it still wants to know that it did not win.
  6. y
  7. (OPTIONAL) Score (points scored by action according to World)
  1. Confirm
  2. (OPTIONAL) Q (how good this was, or how many points scored, for Mind).
  3. (OPTIONAL) W (estimate by Mind of how much it lost by this being executed)

This command helps the Mind learn how difficult it is to win when we are in state x, and how bad it is if someone else wins. The Mind may decide to increase the value of W next time round in this state.


Notes:
  1. Any Mind server may call another Mind server to get its action. Up until now, the Mind server was not involved in any competition, so it did not have to report to the client that it was calling another server. In response to "Get action", it just calls that other server and returns the action.

    If it is involved in a competition, however, it would be far more efficient to postpone calling the other Mind server until it has actually won the competition. So in this case it returns the other Mind URL to the client. If it wins, the client can send "Get action" to that other Mind.


Additional MindFeu queries

A MindFeu server is a Mind server that accepts Feudal commands of the form: "Take me to state c" [PhD, §18.2]. The other Mind servers have their own motivations and suggest actions according to them, and clients using them can then decide whether or not to use these suggestions. But a MindFeu server does not have its own goals, and is only used via this call by another Mind server which has goals:

Request name Argument data Return data
Take me to state
  1. mind run ID
  2. x
  3. destination state c
  4. (OPTIONAL) Open-ended set of arguments whose format is interpreted by the Mind server. These are parameters for this step only, e.g.:
    1. Q-Temperature
    2. W-Temperature
  1. (OPTIONAL) a (my suggested action)
  2. (OPTIONAL) Mind server URL (of the Mind I want to call now)
  3. (OPTIONAL) Q (how good a is for the purposes of getting from x to c)
  4. (OPTIONAL) W (how important it is to win the competition now, for the purposes of getting from x to c)

The MindFeu server must return either an action or the URL of a server that will generate the action.

How good/bad is this action (to take me to state c)
  1. mind run ID
  2. x
  3. c
  4. a
  5. (OPTIONAL) Open-ended set of arguments whose format is interpreted by the Mind server. These are parameters for this step only, e.g.:
    1. Q-Temperature
    2. W-Temperature
  1. (OPTIONAL) Q
  2. (OPTIONAL) W


Additional MindAS queries

A MindAS server is a Mind server that resolves competition between multiple subsidiary Mind servers. Either this is hidden from the client (and so the server just appears as an ordinary Mind server above), or else the client provides this list via a special constructor. Having provided the list via the constructor, the client thereafter uses the server just like an ordinary Mind server.

One interesting issue would be, if we have a hierarchy of Action Selection competitions, and the MindAS server will appear as just another primitive Mindi server competing in a higher-level Action Selection competition, then where does it get its W-values from? Does it pass upwards the W-values from the winning Mind server below it? Of course, interesting as this is, this is a problem for the server author, not an issue for the WWM specification here. The MindAS server author must somehow use the queries defined here to gather information from its subsidiary Mind servers to compete at the higher level.

Request name Argument data Return data
New run - AS arguments
  1. (OPTIONAL) List of Mind server URLs
  2. (OPTIONAL) Open-ended set of arguments whose format is interpreted by the AS server. These are parameters for the AS mechanism, e.g.:
    1. which of a set of algorithms to use
  1. Confirm

  or  

  1. Refusal (mind URLs not valid, failure in following trail of credit from mind URLs, bad parameters)
Add mind to collection
  1. mind run ID
  2. Mind server URL
  1. Confirm

  or  

  1. Refusal (mind URL not valid, failure in following trail of credit from mind URL)
Remove mind from collection
  1. mind run ID
  2. Mind server URL
  1. Confirm


A MindM server may appear with the interface of any of these types of Mind server.



10. How to implement some existing agent architectures as networks of WWM servers

We now show how a number of existing models of agent minds can be implemented as networks of WWM servers using the server queries above.

Hand-coded program

A hand-coded mind program can clearly be implemented as a single Mind server, receiving   x   and returning   a.   There are a vast number of models of agent mind, whether hand-coded, learnt or evolved, that will repeatedly produce an action given a state. Most of these could be implemented as WWM servers without raising any particular issues apart from having to agree on the format of state and action with the World server.

We will not discuss any of these further, except where they raise particular issues with respect to the WWM. For example, below we will refer in detail to different models of Action Selection, because these raise particular WWM issues.

Initial test - Eliza Mind talks to Eliza World

An initial test of the model could be by connecting two Eliza-type programs together to have a conversation. In this case   x   and   a   are both streams of text. The output   a   for one is the input   x   for the other. Which we regard as the "Mind" and which as the "World" under our scheme does not matter. Even in this initial test we could implement some advanced ideas, such as time-outs, and Mind servers keeping track of previous states. It also raises the issue of how a human could become part of the response of a World server or a Mind server.

The Subsumption Architecture

A Subsumption Architecture model [Brooks, 1986, Brooks, 1991] could be implemented as a hierarchy of MindM servers, each one building on the ones below it. Each one sends the current state   x   to the server below it, and then either uses their output or overrides it. So each Mind server sees state   x   and gets to respond. As in Brooks' model, a set of lower layers will still work if the higher layers are removed. On the WWM, there may be many choices for (remote, 3rd party) higher layers to add to a given collection of lower layers.

Serial models

In serial models, a mind server will "complete" its activity before another mind server will start [Singh, 1992, Tham and Prager, 1994, Wixson, 1991]. This can be driven by a master MindM server that passes control from server to server. This MindM server needs to know when each goal terminates, which requires it to have a lot of intelligence.

To reduce the demands on the intelligence of the master server, each server itself may know whether it is ready to execute or not (preconditions not true yet, or it has just completed its goal). The server can return this in response to the "Ready to suggest action?" query. Then the MindM server only needs to know the order of the chain of servers. The servers themselves tell it when it is time to switch to the next server.

Or we could avoid having a master MindM server altogether if each server, when its goal is completed, will pass all requests for actions thereafter on to its successor server (which it knows about). Then we simply interact with the Society through the first mind server in the chain.

Maes' Spreading Activation Networks

Maes' Spreading Activation Networks [Maes, 1989, Maes, 1989a] or Behavior Networks consist of a network of "servers" which are aware of their preconditions. Servers can be linked to from other servers that can help to make those preconditions come true, or be inhibited by other servers who will cause their preconditions to not hold. They can in turn link to other servers whose preconditions their behavior can affect. This might be implemented on the WWM by one server constructing the state   x   for the server it is calling, putting the preconditions into   x.  

Maes' spreading activation mechanism spreads excitation and inhibition from server to server. This might be implemented on the WWM using the "Change mind strength" message.

Reinforcement Learning

An ordinary Reinforcement Learning (RL) agent, which receives rewards and punishments as it acts [Kaelbling et al., 1996], can clearly be implemented as a single Mind server. For example a Q-learning agent [Watkins, 1989] builds up Q-values ("Quality"-values) of how good each action is in each state:   Q(x,a).   This is stored in a data structure inside the agent - either a straightforward lookup table, or else a generalisation such as a neural network. Then, given a state, the agent can produce an action based on these Q-values. This maps easily to the WWM model of a Mind server above, as does any similar notion of a state-space learner, e.g. [Clocksin and Moore, 1989].

When learning, the Q-learner can calculate its own reward based on   x,     a   and   y   [PhD, §2.1.3]. So long as the client informs it what state   y   resulted from the previous action   a,   it can calculate rewards, and learn.

Hierarchical Q-Learning

Hierarchical Q-Learning [Lin, 1993] is a way of driving multiple Q-learners with a master Q-learner. It can be implemented on the WWM as follows. The client talks to a single MindAS server, sending it   x   and receiving   a.   The MindAS server talks to a number of Mind servers. These do not necessarily have to support all of the advanced queries of the Mindi server above. They may simply return an action, unaware that there are other minds in the body. The MindAS server maintains a table of values   Q(x,i)   where   i   is which Mind server to pick in state   x.   Initially its choices are random, but by its own reward function, noting what states the choices take us to, the MindAS server fills in values for   Q(x,i).   Having chosen   i,   it passes on the action suggested by Mind server   i   to the client.

In fact, to save on the number of server queries (which is a more serious issue on the WWM than in a self-contained system), we would do the following. Each time step, when presented with a state   x,   the MindAS server makes a decision based on its Q-values (initially random) and then, having picked action   i,   queries a single Mind server   i   for its action. Note then that the question of being obeyed or not obeyed does not apply to the other Mind servers - they were not even asked for an action on this step.

Whether they were asked for an action or not, the other Mind servers can still learn while in this system, if the MindAS server tells them what action it executed. i.e. They will need to support at least one of the advanced Mindi queries above. In general, any Mind server in a competition needs to be informed if it was obeyed, and what action was taken. Otherwise it may think that its action (which was not taken) led to the new state.

One interesting possibility with Hierarchical Q-Learning on the WWM is that the MindAS server need not know its list of Mind servers in advance. It can be passed this list by the client at startup, using the special constructor defined above. Of course then it will have to learn from scratch (i.e. it is sent a high Temperature parameter).

Another possibility is that the subsidiary Mind servers need not be Q-learners. They could be any type of Mind server, and the MindAS server simply learns which one to let through.

Action Selection with a single query or multiple queries

For many of the following models it will be useful to distinguish between two types of MindAS server:

  1. An ASs server makes a single query of each Mind server before making its decision.

  2. An ASm server makes multiple queries of each Mind server before making its decision.

Hierarchical Q-Learning is not either of these because it does not even query all Mind servers once. Based on its   Q(x,i)   values it just makes one query of one Mind server.

Static measures of W

We will consider a number of schemes where Mind servers promote their actions with a weight W, or "W-value". Ideally the W-value will depend on the state   x   and will be higher or lower depending on how much the Mind server "cares" about winning the competition for this state [PhD, §5].

A static measure of the W-value [PhD, §5.3] is one in which the Mind server promotes its action with a value of W based on internal reasons, and independent of the competition. Any such method (including, say, W=Q) can clearly be implemented as a Mindi server. There will be a number of Mindi servers, and then a simple MindAS server which lets through the one with the highest W-value. This is an ASs server.

Dynamic measures of W

A dynamic measure of W [PhD, §5.5] is one in which the value of W changes depending on whether the Mind server was obeyed, and perhaps on who won instead if it did not. Clearly this is an ASs server that queries once, lets through the highest W, and then reports back afterwards to each server whether or not it was obeyed, using the WWM commands defined above. The server may then modify its W-value next time round in this state.

W-learning

W-learning [PhD, §5, §6] is a form of dynamic W where W is modified based on (i) whether we were obeyed or not, and (ii) what the new state y is as a result. This can clearly be implemented on the WWM as an ASs server. All the variations, such as Stochastic highest W [PhD, §6.5], and the winner not altering their W-value [PhD, §6.3], can clearly also be implemented using the WWM queries defined above.

In the pure form of W-learning [PhD, §6, §11, §13] the Minds do not even share the same suite of actions, and so, for example, cannot simply get together and negotiate to find the optimum action (see below). The inspiration was simply to see if competition could be resolved between Minds that had as little in common as possible. I was unable to give convincing examples where this might arise. Now with the WWM, and Minds coming from totally different origins, I hope it is clearer what the usefulness of this is. This is the type of AS method we will need for many situations on the WWM.

Strong and Weak Mind servers

[PhD, §8] showed how altering the absolute size of a Q-learning Mind server's rewards can change the size of the W-values it presents, without altering its policy. To be precise, we can multiply all the base Q-values by any constant   c   to produce an agent with the same policy but different W-values. As a result one could ask for a "strong" version of a Mind server, which would have the same policy as a weak version, but present larger W-values. This would be done by presenting the "mind strength" constant   c   as an argument to the server at startup. For a full explanation of how to carry out "Normalisation" and "Exaggeration" of the same basic behaviour, see [PhD, §C, §D]. Artificial Selection could search for good combinations of strong and weak servers by either:

  1. Automated search.

    or:

  2. By hand. Slowly increase or decrease the strength values, leaving all the details of the competition to be resolved automatically, and then observe the resulting global behaviour [PhD, §16.3, §17.2].

Matching World state definition with Mind state definition

Reinforcement Learning also shows us how we can get away with not defining a format for the state   x   and action   a.   In RL,   x   and   a   are abstractions, so that, for example, we define a model where executing action   a   in state   x   leads to state   y   with probability   Pxa(y)   - yet there is no need to actually define the format of   x   and   a   at this point.

Similarly with the WWM we leave state and action as undefined streams of text data terminated by   </data>.   How these streams are to be decoded is a matter for the World and Mind servers to agree among themselves. Servers will advertise (at their URLs) what format they expect and what format they generate, and others will act accordingly. Collections of servers that have incompatible formats, and therefore do not work, are not a problem. People will expect that vast numbers of Societies will not work at all, or work poorly, and the whole mindset will be to search for ones that work better than others, follow "Top 10" lists of good performers in a certain World, and so on. Societies that are not compatible, or even ones that are compatible but work poorly, will simply not be advertised.

"Islands" of compatible worlds

This does raise the question, though, of whether different sub-zones of the WWM will develop, each incompatible with the other. It seems that this will indeed happen. For any World, there will be an island of Minds that understand this World and interpret its definition of state   x   or some subset of   x.   If the World is popular, other Worlds might be built to the same specification, so that the same Mind can act in all of these Worlds [Ray, 1995]. There will be a (perhaps very large) "island" of compatible Worlds and Minds, separate from other islands built to different specifications.

The AS servers might be more independent of the World definition, so that the same AS server can be used in different "islands". The AS server will receive   x   and return   a,   but need not understand the structure of either, but just pass on   x   to the Mind servers that do understand it, make a decision as to who to pick based on, say, the highest W-value, and then return whatever meaningless stream of data they provide as the action   a.  

The "island" of the physical world

For real robots, since the real physical world is the same for everyone, one might think there would be just one island - so that any real-world Mind could act on any real-robot World server. Not so, of course, because how you sense the real world (state   x)   depends on what sensors the robot hardware possesses, and what format they deliver their input in. One could imagine, though, that there will be a separate island clustered around each robot make. For instance, Mind servers that will run on any Khepera robot. Mind servers that will run on any LEGO Mindstorms robot. Mind servers that will run on a Nomad robot that has certain specified add-ons.

So, in conclusion, the network of World-Wide-Minds will not be unified, but will consist of a number of separate incompatible islands.

Mind servers with different senses in the same Society

[PhD, §6.6] discussed where Mind servers may have different senses, even within the same Society, which makes their competition even more confusing. Sometimes in a particular state they win, and sometimes, in what seems to be exactly the same state (but is perceived by another Mind server as a different state) they lose (because the other Mind server competed differently).

In a WWM implementation of this, the MindAS server may receive the full state from the World, and then send a different sub-space of that to each Mind server as its input state. This is actually what we did with Hierarchical Q-learning [PhD, §4.4]. Both W-learning with subspaces [PhD, §7, §8] and W-learning with full space [PhD, §10] can also clearly be implemented as ASs servers using the WWM primitives above.

Global Action Selection decisions

If Minds do share the same suite of actions, then we can make various global decisions. Say we have   n   Mind servers. Mind server   i's   preferred action is action ai.   Mind server   i   can quantify "how good" action   a   is in state   x   by returning:

Qi(x,a)
and can quantify "how bad" action   a   is in state   x   by returning:
Qi(x,ai) - Qi(x,a)

Then we have 4 basic approaches [PhD, §14]:

  1. Maximize the Best Happiness:
    MAXa MAXi   Qi(x,a)
    which is in fact the same as static W=Q above, and can be implemented as an ASs server, with just one query to each Mind server to get its best action and its Q-value.

  2. Minimize the Worst Unhappiness :
    MINa MAXi   ( Qi(x,ai) - Qi(x,a) )
    which is an ASm server, requiring multiple queries of each Mind server.

  3. Minimize Collective Unhappiness :
    MINa   [ SUMi ( Qi(x,ai) - Qi(x,a) ) ]
    which is an ASm server.

  4. Maximize Collective Happiness :
    MAXa   [ SUMi Qi(x,a) ]
    which is an ASm server.

Other Action Selection methods based on RL

There are a number of other related AS methods, which can all be implemented as WWM servers:

  1. Negotiated W-learning [PhD, §11] is an ASm method.

  2. Collective W-learning [PhD, §12.2] can be implemented by the MindAS server building up a table of   W(x,i)   - the combined loss that Mind server   i   causes for everyone else when it wins (each server reports back to the MindAS server their own loss W). Like Hierarchical Q-learning, this is neither ASs nor ASm. It chooses a Mind server based on its W-values table and then makes one query of one server to return its action. Then it sends a command to every server to tell it what happened, and adjusts its W-value according to the losses the servers report in their responses. All the variants of this can clearly be implemented as well, including Stochastic lowest W [PhD, §12.2.2] and Negotiated Collective W-learning [PhD, §12.2.4] (which is an ASm method).

  3. Collective Equality [PhD, §12.4] is an ASm method.

  4. Any form of scaling the W-value [PhD, §8.1.2] can be implemented as well, with extra complexity in the Mind server, but no need for extra server queries.

  5. As referenced in [PhD, §F], there are other measures of Happiness and Unhappiness that may or may not make sense, all of which can be implemented by single or repeated WWM queries.

Other parallel models

The DAMN Architecture [Rosenblatt, 1995, Rosenblatt and Thorpe, 1995] implements an action selection method similar to Maximize Collective Happiness. The Q-values of Mind server   i   are multiplied by weights wi which reflect the current priorities of the system. This could be implemented as an ASm server, where the AS server maintains a set of weights wi.

Product Maximize Collective Happiness [PhD, §15.5.2], adapted from Grefenstette's work [Grefenstette, 1992], can be implemented on the WWM as an ASm server.

A number of other authors [Aylett, 1995, Tyrrell, 1993, Whitehead et al., 1993, Karlsson, 1997, Ono et al., 1996] implement, using a variety of notations, one of the 4 basic AS methods defined above [see PhD, §15.4]. Though none, as far as I am aware, have tried a Minimize the Worst Unhappiness strategy.

The AS server remembering the winner

To reduce the number of server queries needed, the AS server may remember who won the state last time, and build up a table   k(x)   for the winner (or   a(x)   for compromise actions) so it does not have to run the competition again [PhD, §11]. Obviously this only works if the servers remain unchanged (they are not learning), and if the collection of servers remains unchanged (no new servers, or old ones leaving).

Dynamically changing collections

To continue that last point, the WWM server queries allow for a collection of Mind servers where new ones are added or old ones removed during the course of the run [PhD, §17.6]. The implementation of the "Add mind" and "Remove mind" commands in the MindAS server will then send a "Collection changed" message to all of its subsidiary Mind servers, to inform them that the competition has changed and they may have to re-learn their W-values.

Nested Mind servers

Digney [Digney, 1996] defines Nested Q-learning, where each Mind in a collection is able to call on any of the others. Each Mind server has its own set of actions Qi(x,a) and a set of actions Qi(x,k) where action   k   means "do whatever server   k   wants to do" (as in Hierarchical Q-learning). Of course we already have in general that a MindM server can call other Mind servers. What is different here is:

  1. It learns how good it is to call other servers. To do this, it needs to be supplied with some extra information, such as who won.
  2. Because it learns, it can be supplied with the list of Mind servers at startup, rather than having it pre-coded.

In a WWM implementation, each Nested server has a list of Mind URLs, either hard-coded or passed to it at startup. So the Nested server looks like a MindAS server co-ordinating many Mind servers to make its decision. But of course it is not making the final decision. It is merely suggesting an action to the master MindAS server that coordinates the competition between the Nested servers themselves. When the master MindAS server is started up with a list of Mind servers, it passes the list to each of the servers.

Consider the number of server queries in a Nested WWM system. The master MindAS server is given   x   and asked for an action   a.   It sends   x   to each Mind server. Server   i   looks at its Q-values and either suggests an action directly, or returns the URL of some server   j.   Server   j   is not yet queried. We wait to see if server   i   can win the competition. (In fact, server   j   may already have been queried separately for its own action.) If server   i   wins, then we query server   j   and get an action, which may be "Do what server   k   does" and so on. As well as allowing the Nested server to return a Mind URL instead of an action, we also need the master server to tell it the Mind URL of the winner. Remember that in Hierarchical Q-learning, the master server needs to know who won, so it can put values on Q(x,i). But of course it knows itself who won. Here, the Nested server needs to know who won, so it can put values on Qi(x,k). So it needs to be told who won by the master server.

Each server calling a different list of servers

In the basic model we just described, all Mind servers can call all other Mind servers in the list. But in fact, the list could be different for each server. Each server could hard-code its own list of servers that it may call, similar to how any hand-written MindM server hard-codes its list of servers. One confusion would be, when we tell the server who won, and we pass it the URL of a server that is not in its list of possible servers to call.

Servers outside the AS loop

[PhD, §18.1] also shows how some of the Nested servers might actually be outside the Action Selection competition, and simply wait to be called by a server that is in the competition. I call these "passive" servers. We have the same with hand-coded MindM servers, where some Mind servers may have to wait to be called by others. A server may be "passive" in one Society and at the same time "active" (i.e. the server is in the Action Selection loop) in a different Society.

Feudal Mind servers

Watkins [Watkins, 1989] defines a Feudal (or "slave") Q-learner as one that accepts commands of the form "Take me to state   c".   On the WWM, these Feudal Mind servers will be driven by other Mind servers that actually have preferences about what goal state to get to. In Watkins' system, the command is part of the current state. Using the notation   (x,c),a -> (y,c)   the slave will receive rewards for transitions of the form:   (*,c),a -> (c,c)   So the master server drives the slave server by explicitly altering the state for it. We do not have to change our definition of the server above. It receives   x   and produces   a.   It is just that the server driving it is constructing the state   x   rather than simply passing it on from above.

Another possibility is that the real state and the command are explicitly separated in the server query, which is what we allowed for with the additional MindFeu queries above. Using either of these approaches, the WWM model allows for Mind servers that provide a service of taking one to explicit goal states. e.g. Moore [Moore, 1990] has a concept of an explicit goal state, and Kaelbling's Hierarchical Distance to Goal (HDG) algorithm [Kaelbling, 1993] addresses the issue of giving the server new explicit goal states at run-time.

The sub-symbolic Society of Mind

The Nested and Feudal models are combined in [PhD, Fig. 18.4] showing the general form of a Society of Mind based on Reinforcement Learning. It is suggested that Reinforcement Learning would be one of the most fruitful areas in which to begin implementing the WWM. That is, we shall begin with a sub-symbolic Society of Mind. Indeed, the whole model of a complex, overlapping, competing, duplicated Society of Mind that we have developed in this paper is based on the generalised form of a Society of Mind based on Reinforcement Learning.

More complex communication between Mind servers

Baum's Economy of Mind [Baum, 1996] has new Mind servers paying off the old ones to gain control. This can still be done through our model. The payments would be managed through the MindAS server, which receives payments through the W-value, and redistributes them through the "Inform it about winner" command.

Is this a sub-symbolic model?

So far we have only defined a protocol for conflict resolution where the AS server makes queries of the Minds for different numeric weights, e.g. "How much will you pay to stop this happening?". As discussed, we may need further protocols for more sophisticated, symbolic communication among Mind servers. We imagine that numeric weights will be easily generated by sub-symbolic Minds, and are harder to generate in symbolic Minds. This is because symbolic Minds often know what they want to do but not "how much" they want to do it. Sub-symbolic Minds, who prefer certain actions precisely because numbers for that action have somehow risen higher than numbers for other actions, may be able to say precisely "how much" they want to do something, and quantify how bad alternative actions would be [PhD, §5.2].

The Action Selection model may be sometimes limiting, though, in demanding that the Mind win the competition objectively. We cannot just say that "Mind M3 should always win when the World is in state Z". Instead, M3 has to win the competition in that state. The solution then is we can surround the Action Selection by a MindM server that ensures that M3 wins in state Z.

It may be that in the symbolic domain we will make a lot more use of MindM servers, and maybe even avoid Action Selection altogether. This might be a popular alternative to having Minds generate Weights to resolve competition. The drawback, of course, is that the MindM server needs a lot of intelligence. It needs to understand the goals of all the Mind servers. This relates to the "homunculus" problem, or the need for an intelligent headquarters, see [PhD, §5].




11. HTTP CGI using XML

We now ask what actual technology should we use to implement the WWM queries. I suggest one overriding objective:

  1. That the WWM server authors be required to know as little as possible to get their servers on the network.

The server authors are interested in AI, not necessarily in networks. They may only know AI programming languages such as LISP. They may have never written a network application, and they may not want to learn. If we accept this criterion, then we should seek a lowest-common-denominator approach that will enable AI researchers to put their minds and worlds online with the minimum of delay. Ideally, we would have the following:

  1. The WWM server authors can write their program in any programming language, according to any programming methodology, on any operating system.
  2. The WWM server authors do not have to install any new software, but can run their programs on existing Web servers.
  3. World server authors do not have to learn any particular language for describing a world, such as VRML.
  4. The WWM server authors do not have to learn any new programming language, such as Java, Perl, or any other language.
  5. The WWM server authors do not have to learn any new network or object-oriented programming techniques. If all they know how to do is read from stdin and write to stdout, that should be sufficient to write a WWM server.

As [Bosak and Bray, 1999] put it: "schemes that rely on complex, direct program-to-program interaction have not worked well in practice, because they depend on a uniformity of processing that does not exist." [Paulos and Canny, 1996] make a strong call for a lowest-common-denominator approach in remote access to robots.

HTTP CGI

It is clear what the lowest-common-denominator system is on the network today, the system by which thousands of programmers have put programs and scripts online that were never online before. It is CGI. It is proposed that the lowest-common-denominator implementation of the WWM be done using CGI across HTTP. Every AI programmer has access to a HTTP server with CGI, and every AI programmer can write a program that receives stdin and writes to stdout.

XML

The data transmitted would not be HTML, as in "normal" CGI scripts, but would rather be the server queries, responses, and associated data. It is proposed that this be encoded as text-based XML [Bosak and Bray, 1999] rather than in a binary format. The advantages would be:

  1. XML is human-readable text, so it can be read and altered by hand on any system. One does not have to go through any particular application. In particular, it is easy to get your own program (in any language) to generate XML text output.
  2. As for reading XML input, it is a standard format (open tag, close tag) so that XML parsers are available in most languages (and it is easy to parse yourself in any case).
  3. It can be transmitted by CGI now to and from existing Web servers, with no extra modification needed.
  4. It is easy to extend the tag definitions (and hence the server query definitions) in future, without breaking the old definitions.

XML encoding of server queries

An example of an XML-encoded query would be as follows. The client asks the mind server to set up a new run, telling it what world we will be running it in. HTTP CGI POST is used to send the data to the mind server, so that the mind server receives the following XML code on stdin:

<xml>
<query name="New run">
<data name="world run ID"> 40031 </data>
<data name="world display URL"> http://worldserver/currentruns/40031.html </data>
</query>
</xml>

The mind server replies, assigning the client a unique run ID. The server writes to stdout:

<xml>
<response name="New run">
<data name="mind run ID"> 5505 </data>
<data name="mind display URL"> http://mindserver/currentruns/5505.html </data>
</response>
</xml>

The client uses this unique ID in each subsequent request:

<xml>
<query name="Get action">
<data name="mind run ID"> 5505 </data>
<data name="state">
x
</data>
</query>
</xml>

The state is simply a series of text characters representing the state, terminated by the   </data>   tag. How to decode these characters is something that the servers have to agree among themselves. The mind server returns an action:

<xml>
<response name="Get action">
<data name="action">
a
</data>
</response>
</xml>

Again, the format of the action is something the servers have to agree on.

"AIML"

When we have precisely expressed the server queries as XML, we might call the resulting markup language "AIML" or "AI Markup Language".

Though it has to be noted that the name "AIML" is currently being used in a much more restricted domain - It is being used by the ALICE chatbot project [alicebot.org] as a means of allowing users to define their own chatbots. In this case, XML is being used to define the program itself rather than the data being passed back and forth. A chatbot is only one form of many possible types of WWM server, so a name like "ChatML" or even "ElizaML" would have been much more appropriate than "AIML". But criticism of this point aside, one interesting thing about ALICE is that in trying to allow non-technical users construct entire servers, in some ways it takes 3rd party involvement even further than envisaged in this work.

The Swarm project, which aims to provide a standardised simulator for multi-agent worlds, has also recently moved to XML because of the restrictions inherent in forcing the use of any particular programming language [Daniels, 1999]. Again, XML is used as a way of defining the environment and agents and avoiding programming, rather than as a way of communicating between separate minds and worlds. [Noda et al., 1998] is probably the closest previous work to the WWM, using a client-server design, programming language independent, transmitting ASCII strings.

Addressing

All requests to a WWM server are requests to a CGI program on a Web server:

http://site/directory/program

All arguments (including the type of WWM request being sent) are passed in stdin.

Persistent CGI

Normally, CGI is 1 process per request. In a normal CGI request, input comes in on stdin, a new process is started, output is generated on stdout, and then the process terminates. Here, though, we have a WWM server program where we want to send it multiple requests at different times over the course of a long run. The question is: How do we maintain state in between requests?

  1. Save to disk and Restore - The first issue is whether we can maintain state at all. The lowest-common-denominator approach would be for the programmer to save the state of their program to disk after each WWM request, and then restore it (from disk) when the next request comes in. All programmers can do this, though it may be some work. It is also very inefficient, starting the program again from scratch for each new request. But with powerful machines, this might not matter greatly. The important point is that the AI programmer can save and restore the state without learning any new programming techniques.

  2. "Persistent CGI" - The second issue is whether an efficient WWM server can be built. For example, one where the "Start run" request starts a process running that does not terminate when the "Start run" request terminates. Later CGI requests simply talk to this independently-running process. Finally, the "End run" CGI request terminates the process.

    It is no surprise that this is actually a well-known issue with CGI programs, and there are many approaches to it discussed in the CGI community, especially where scripts talk to large databases. How exactly to do it depends on the programming language and server used, and there is no standard solution.

It is not the job of the WWM to propose a solution to the problem of persistent CGI. WWM server authors should be able to use any of the technologies that the CGI community propose. Here we simply note that for an efficient WWM server the server author may need to learn some network programming, which is something we wanted to avoid. But this is not necessary to set up a WWM server at all, which can be done using the save-to-disk method.

The AI programmer might get the server online initially using save-to-disk, and maybe sometime later, after learning some network programming, convert to a persistent process. The client would not notice any difference (except a speed improvement).

Asynchronous worlds

The CGI model is client-driven. Servers only respond to client requests. So how could we have an asynchronous world - a World that changes even when no client is making requests to it? Note this is actually the same issue as persistent CGI above: How can we start a process that does not end when the "Start run" request ends, but that carries on, and only ends when a new "End run" request is made?

It is also the same issue as: If the client never issues an "End run" command, can we time it out if we haven't received queries from it in a long time. This would be important to relinquish control of a shared resource, e.g. a robot [Stein, 1998]. This could still be client-driven: The old client gets timed out when the new client makes its "Start run" attempt.

There is a similar "quick and dirty" way of making an asynchronous world while still retaining the simple client-driven model: The next time the client makes a request, the server calculates the time since the last request, and runs the world forward that number of timesteps before replying. An alternative, but more complex way, would be for the server to have its own "clock-tick" client, which sends it a periodic clock tick using the "no op" command. Each time it receives a clock tick it can update the world.




Part 4

Return to Contents page.



ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.      New 250 G VPS server.

Note: Links on this site to user-generated content like Wikipedia are highlighted in red as possibly unreliable. My view is that such links are highly useful but flawed.