School of Computing. Dublin City University.
Online coding site: Ancient Brain
coders JavaScript worlds
Having defined in the first part of this paper how a WWM scheme could work, and what it would be good for, we shall now move towards an actual implementation. First we need to define what the standard mode of operation will be. Will clients connect to servers for long periods of time? Or will they connect, carry out some transaction, and then disconnect? First note the differences with the Web:
An explicit "End run" command might be inserted, perhaps by the client, or by a server when it has detected some condition. Or the loop could run for a pre-determined amount of time.
One way of breaking this indefinite loop down might be to treat each individual interaction with a server as a short, non-looping process. The server responds to a short query with a response that will return within a limited time. The server does not know when, if ever, it will receive the next query. Other algorithms implement loops and more complex logic repeatedly using these primitive server queries.
The client software, which is driving a single top-level Mind and World, will implement a program something along these lines:
To clarify, the non-technical client user will use the servers through standardised client software. It is the client software that will implement this overall control algorithm. The non-technical client user will see none of this.
The unique run ID is because the server may be simultaneously involved in many other runs with other clients. The server must keep the details of each run separate from each other. [Paulos and Canny, 1996] use a unique run ID like this in Internet robotics, where a World is servicing multiple Minds. Here also the Mind server may be involved in other Societies running at the same time. So the World-Wide-Mind is even stranger than having bits of your mind distributed around the world. It means that bits of your mind are simultaneously running in other minds.
The Mind server does not talk to the World server directly. Rather, the servers respond to short, finite-length requests. The client algorithm controls how many such requests occur. It is up to the client how long it runs the above loop for. The client may also implement a time-out. If the Mind takes too long to reply, the client could:
or:
Similarly, if the World is down, the client may wait until it is back up, and then requery the Mind with the new state, instead of blindly executing the old, unexecuted action. This scheme could allow for a variable use of time, where the client may take days to come back to each server with the next request.
Continuing the discussion about time, it is important to reiterate that the above does not define the Mind as a stimulus-response machine. The Mind is simply receiving a periodic update about the state of the World. The Mind may run according to a different clock to the World:
The MindAS server responds to queries like any other server. Inside its "Get action" query is some complex logic interrogating a list of Mind servers to find a winning action. This may be a loop but, unlike the client, it will be a finite-length loop, not an indefinite-length loop.
Similarly, the MindAS server will receive an "Inform it about new state y" command after each action is executed. Inside this command it will send an "Inform it about new state" command to all of its subsidiary Mind servers, along with extra information that only the MindAS server knows, such as whether they were obeyed or not.
The MindAS server may implement time-outs. It is periodically sending a query to all Mind servers. It cannot afford to go round them in rotation. It cannot afford to wait until Mind server M1 has returned before sending a request to M2. Instead it must send requests to all of them in parallel, and receive the replies as they come in. With multiple Mind servers there is a much greater chance of some being slow, offline, or even gone altogether (broken links). Any sensible MindAS server will implement a time-out. If some of the Mind servers do not respond within the time-out, it will make a decision based on whatever actions have come in.
Clearly the above client algorithm pseudo-code could be made more efficient. Perhaps there is no need to connect to the Mind after each action - it can get to find out what happened (what the new state is) next time around the loop when it is asked for a new action. And next time round there is no need to query the World again - we already know what the new state is from when we executed the action. But the point is that it is up to the client to write this program. It will not be laid down in the protocol. Similarly, a server may implement any algorithm it likes provided it responds to the set of queries expected of it.
The definition of the WWM comes down to this, the definition of the possible queries and responses of the servers. The client software may implement any general-purpose algorithm based on these queries and responses. The Mind servers, the World servers, the MindAS servers, the MindM servers and the WorldW servers may each implement any general-purpose algorithm based on these queries and responses, provided that they themselves respond to these queries.
Request name | Argument data | Return data |
New run |
|
or
|
Get display URL |
|
|
"No operation" (Possibly used as a periodic clock timer, or just to inform the server that the client is still running.) |
|
|
Get state |
|
|
Execute action |
|
|
Reset (Reset the world as it would be at the start of a run. e.g. We are trying to solve a problem. Previously we were just learning. Now we want to test our knowledge.) |
|
|
Reset score |
|
|
Get current score |
|
|
End run |
|
|
Notes:
A WorldW server has the same interface as a World server.
Request name | Argument data | Return data |
New run |
|
or
|
Get display URL |
|
|
"No operation" |
|
|
Ready to suggest action? |
|
or
|
Get action |
|
or
Note that the action "Do nothing" is not equivalent to "Cannot suggest action". |
Inform it about state (Inform it what happened when the action was executed) |
|
|
Reset score |
|
|
Get current score |
|
|
End run |
|
|
Notes:
A MindL server is a Mind server that learns, and supports the following additional queries:
Request name | Argument data | Return data |
New run - MindL arguments |
|
or
|
Get Q-Temperature |
|
|
Reset Q-Temperature / World has changed |
|
|
Send explicit Q-Temperature |
|
|
Get action |
|
or
|
A Mindi server is a Mind server that accepts it may not be the only mind in the body, and supports the following additional queries:
Request name | Argument data | Return data |
New run - Mindi arguments |
|
or
|
Get mind strength |
|
|
Change mind strength
(Could ask for a new instance of the mind with a different strength, or there might be some reason to keep the current instance and change its strength) |
|
|
Get W-Temperature |
|
|
Reset W-Temperature / Collection has changed (new competing Minds, or old competing Minds have gone) |
|
|
Send explicit W-Temperature |
|
|
Get suggested action with values |
|
or
The Mindi server must return either an action, or the URL of a server that will generate the action, or a "Cannot suggest action" message. |
Get values for this action (How good/bad is this action) |
|
It is possible for an action to have high Q and high W. |
Inform it about winner (Inform it what happened) |
|
This command helps the Mind learn how difficult it is to win when we are in state x, and how bad it is if someone else wins. The Mind may decide to increase the value of W next time round in this state. |
Notes:
If it is involved in a competition, however, it would be far more efficient to postpone calling the other Mind server until it has actually won the competition. So in this case it returns the other Mind URL to the client. If it wins, the client can send "Get action" to that other Mind.
A MindFeu server is a Mind server that accepts Feudal commands of the form: "Take me to state c" [PhD, §18.2]. The other Mind servers have their own motivations and suggest actions according to them, and clients using them can then decide whether or not to use these suggestions. But a MindFeu server does not have its own goals, and is only used via this call by another Mind server which has goals:
Request name | Argument data | Return data |
Take me to state |
|
The MindFeu server must return either an action or the URL of a server that will generate the action. |
How good/bad is this action (to take me to state c) |
|
|
A MindAS server is a Mind server that resolves competition between multiple subsidiary Mind servers. Either this is hidden from the client (and so the server just appears as an ordinary Mind server above), or else the client provides this list via a special constructor. Having provided the list via the constructor, the client thereafter uses the server just like an ordinary Mind server.
One interesting issue would be, if we have a hierarchy of Action Selection competitions, and the MindAS server will appear as just another primitive Mindi server competing in a higher-level Action Selection competition, then where does it get its W-values from? Does it pass upwards the W-values from the winning Mind server below it? Of course, interesting as this is, this is a problem for the server author, not an issue for the WWM specification here. The MindAS server author must somehow use the queries defined here to gather information from its subsidiary Mind servers to compete at the higher level.
Request name | Argument data | Return data |
New run - AS arguments |
|
or
|
Add mind to collection |
|
or
|
Remove mind from collection |
|
|
A MindM server may appear with the interface of any of these types of Mind server.
We now show how a number of existing models of agent minds can be implemented as networks of WWM servers using the server queries above.
A hand-coded mind program can clearly be implemented as a single Mind server, receiving x and returning a. There are a vast number of models of agent mind, whether hand-coded, learnt or evolved, that will repeatedly produce an action given a state. Most of these could be implemented as WWM servers without raising any particular issues apart from having to agree on the format of state and action with the World server.
We will not discuss any of these further, except where they raise particular issues with respect to the WWM. For example, below we will refer in detail to different models of Action Selection, because these raise particular WWM issues.
An initial test of the model could be by connecting two Eliza-type programs together to have a conversation. In this case x and a are both streams of text. The output a for one is the input x for the other. Which we regard as the "Mind" and which as the "World" under our scheme does not matter. Even in this initial test we could implement some advanced ideas, such as time-outs, and Mind servers keeping track of previous states. It also raises the issue of how a human could become part of the response of a World server or a Mind server.
A Subsumption Architecture model [Brooks, 1986, Brooks, 1991] could be implemented as a hierarchy of MindM servers, each one building on the ones below it. Each one sends the current state x to the server below it, and then either uses their output or overrides it. So each Mind server sees state x and gets to respond. As in Brooks' model, a set of lower layers will still work if the higher layers are removed. On the WWM, there may be many choices for (remote, 3rd party) higher layers to add to a given collection of lower layers.
In serial models, a mind server will "complete" its activity before another mind server will start [Singh, 1992, Tham and Prager, 1994, Wixson, 1991]. This can be driven by a master MindM server that passes control from server to server. This MindM server needs to know when each goal terminates, which requires it to have a lot of intelligence.
To reduce the demands on the intelligence of the master server, each server itself may know whether it is ready to execute or not (preconditions not true yet, or it has just completed its goal). The server can return this in response to the "Ready to suggest action?" query. Then the MindM server only needs to know the order of the chain of servers. The servers themselves tell it when it is time to switch to the next server.
Or we could avoid having a master MindM server altogether if each server, when its goal is completed, will pass all requests for actions thereafter on to its successor server (which it knows about). Then we simply interact with the Society through the first mind server in the chain.
Maes' spreading activation mechanism spreads excitation and inhibition from server to server. This might be implemented on the WWM using the "Change mind strength" message.
An ordinary Reinforcement Learning (RL) agent, which receives rewards and punishments as it acts [Kaelbling et al., 1996], can clearly be implemented as a single Mind server. For example a Q-learning agent [Watkins, 1989] builds up Q-values ("Quality"-values) of how good each action is in each state: Q(x,a). This is stored in a data structure inside the agent - either a straightforward lookup table, or else a generalisation such as a neural network. Then, given a state, the agent can produce an action based on these Q-values. This maps easily to the WWM model of a Mind server above, as does any similar notion of a state-space learner, e.g. [Clocksin and Moore, 1989].
When learning, the Q-learner can calculate its own reward based on x, a and y [PhD, §2.1.3]. So long as the client informs it what state y resulted from the previous action a, it can calculate rewards, and learn.
Hierarchical Q-Learning [Lin, 1993] is a way of driving multiple Q-learners with a master Q-learner. It can be implemented on the WWM as follows. The client talks to a single MindAS server, sending it x and receiving a. The MindAS server talks to a number of Mind servers. These do not necessarily have to support all of the advanced queries of the Mindi server above. They may simply return an action, unaware that there are other minds in the body. The MindAS server maintains a table of values Q(x,i) where i is which Mind server to pick in state x. Initially its choices are random, but by its own reward function, noting what states the choices take us to, the MindAS server fills in values for Q(x,i). Having chosen i, it passes on the action suggested by Mind server i to the client.
In fact, to save on the number of server queries (which is a more serious issue on the WWM than in a self-contained system), we would do the following. Each time step, when presented with a state x, the MindAS server makes a decision based on its Q-values (initially random) and then, having picked action i, queries a single Mind server i for its action. Note then that the question of being obeyed or not obeyed does not apply to the other Mind servers - they were not even asked for an action on this step.
Whether they were asked for an action or not, the other Mind servers can still learn while in this system, if the MindAS server tells them what action it executed. i.e. They will need to support at least one of the advanced Mindi queries above. In general, any Mind server in a competition needs to be informed if it was obeyed, and what action was taken. Otherwise it may think that its action (which was not taken) led to the new state.
One interesting possibility with Hierarchical Q-Learning on the WWM is that the MindAS server need not know its list of Mind servers in advance. It can be passed this list by the client at startup, using the special constructor defined above. Of course then it will have to learn from scratch (i.e. it is sent a high Temperature parameter).
Another possibility is that the subsidiary Mind servers need not be Q-learners. They could be any type of Mind server, and the MindAS server simply learns which one to let through.
For many of the following models it will be useful to distinguish between two types of MindAS server:
Hierarchical Q-Learning is not either of these because it does not even query all Mind servers once. Based on its Q(x,i) values it just makes one query of one Mind server.
We will consider a number of schemes where Mind servers promote their actions with a weight W, or "W-value". Ideally the W-value will depend on the state x and will be higher or lower depending on how much the Mind server "cares" about winning the competition for this state [PhD, §5].
A static measure of the W-value [PhD, §5.3] is one in which the Mind server promotes its action with a value of W based on internal reasons, and independent of the competition. Any such method (including, say, W=Q) can clearly be implemented as a Mindi server. There will be a number of Mindi servers, and then a simple MindAS server which lets through the one with the highest W-value. This is an ASs server.
A dynamic measure of W [PhD, §5.5] is one in which the value of W changes depending on whether the Mind server was obeyed, and perhaps on who won instead if it did not. Clearly this is an ASs server that queries once, lets through the highest W, and then reports back afterwards to each server whether or not it was obeyed, using the WWM commands defined above. The server may then modify its W-value next time round in this state.
W-learning [PhD, §5, §6] is a form of dynamic W where W is modified based on (i) whether we were obeyed or not, and (ii) what the new state y is as a result. This can clearly be implemented on the WWM as an ASs server. All the variations, such as Stochastic highest W [PhD, §6.5], and the winner not altering their W-value [PhD, §6.3], can clearly also be implemented using the WWM queries defined above.
In the pure form of W-learning [PhD, §6, §11, §13] the Minds do not even share the same suite of actions, and so, for example, cannot simply get together and negotiate to find the optimum action (see below). The inspiration was simply to see if competition could be resolved between Minds that had as little in common as possible. I was unable to give convincing examples where this might arise. Now with the WWM, and Minds coming from totally different origins, I hope it is clearer what the usefulness of this is. This is the type of AS method we will need for many situations on the WWM.
[PhD, §8] showed how altering the absolute size of a Q-learning Mind server's rewards can change the size of the W-values it presents, without altering its policy. To be precise, we can multiply all the base Q-values by any constant c to produce an agent with the same policy but different W-values. As a result one could ask for a "strong" version of a Mind server, which would have the same policy as a weak version, but present larger W-values. This would be done by presenting the "mind strength" constant c as an argument to the server at startup. For a full explanation of how to carry out "Normalisation" and "Exaggeration" of the same basic behaviour, see [PhD, §C, §D]. Artificial Selection could search for good combinations of strong and weak servers by either:
or:
Reinforcement Learning also shows us how we can get away with not defining a format for the state x and action a. In RL, x and a are abstractions, so that, for example, we define a model where executing action a in state x leads to state y with probability Pxa(y) - yet there is no need to actually define the format of x and a at this point.
Similarly with the WWM we leave state and action as undefined streams of text data terminated by </data>. How these streams are to be decoded is a matter for the World and Mind servers to agree among themselves. Servers will advertise (at their URLs) what format they expect and what format they generate, and others will act accordingly. Collections of servers that have incompatible formats, and therefore do not work, are not a problem. People will expect that vast numbers of Societies will not work at all, or work poorly, and the whole mindset will be to search for ones that work better than others, follow "Top 10" lists of good performers in a certain World, and so on. Societies that are not compatible, or even ones that are compatible but work poorly, will simply not be advertised.
This does raise the question, though, of whether different sub-zones of the WWM will develop, each incompatible with the other. It seems that this will indeed happen. For any World, there will be an island of Minds that understand this World and interpret its definition of state x or some subset of x. If the World is popular, other Worlds might be built to the same specification, so that the same Mind can act in all of these Worlds [Ray, 1995]. There will be a (perhaps very large) "island" of compatible Worlds and Minds, separate from other islands built to different specifications.
The AS servers might be more independent of the World definition, so that the same AS server can be used in different "islands". The AS server will receive x and return a, but need not understand the structure of either, but just pass on x to the Mind servers that do understand it, make a decision as to who to pick based on, say, the highest W-value, and then return whatever meaningless stream of data they provide as the action a.
For real robots, since the real physical world is the same for everyone, one might think there would be just one island - so that any real-world Mind could act on any real-robot World server. Not so, of course, because how you sense the real world (state x) depends on what sensors the robot hardware possesses, and what format they deliver their input in. One could imagine, though, that there will be a separate island clustered around each robot make. For instance, Mind servers that will run on any Khepera robot. Mind servers that will run on any LEGO Mindstorms robot. Mind servers that will run on a Nomad robot that has certain specified add-ons.
So, in conclusion, the network of World-Wide-Minds will not be unified, but will consist of a number of separate incompatible islands.
[PhD, §6.6] discussed where Mind servers may have different senses, even within the same Society, which makes their competition even more confusing. Sometimes in a particular state they win, and sometimes, in what seems to be exactly the same state (but is perceived by another Mind server as a different state) they lose (because the other Mind server competed differently).
In a WWM implementation of this, the MindAS server may receive the full state from the World, and then send a different sub-space of that to each Mind server as its input state. This is actually what we did with Hierarchical Q-learning [PhD, §4.4]. Both W-learning with subspaces [PhD, §7, §8] and W-learning with full space [PhD, §10] can also clearly be implemented as ASs servers using the WWM primitives above.
If Minds do share the same suite of actions, then we can make various global decisions. Say we have n Mind servers. Mind server i's preferred action is action ai. Mind server i can quantify "how good" action a is in state x by returning:
and can quantify "how bad" action a is in state x by returning:Qi(x,a)
Qi(x,ai) - Qi(x,a)
Then we have 4 basic approaches [PhD, §14]:
which is in fact the same as static W=Q above, and can be implemented as an ASs server, with just one query to each Mind server to get its best action and its Q-value.MAXa MAXi Qi(x,a)
which is an ASm server, requiring multiple queries of each Mind server.MINa MAXi ( Qi(x,ai) - Qi(x,a) )
which is an ASm server.MINa [ SUMi ( Qi(x,ai) - Qi(x,a) ) ]
which is an ASm server.MAXa [ SUMi Qi(x,a) ]
There are a number of other related AS methods, which can all be implemented as WWM servers:
The DAMN Architecture [Rosenblatt, 1995, Rosenblatt and Thorpe, 1995] implements an action selection method similar to Maximize Collective Happiness. The Q-values of Mind server i are multiplied by weights wi which reflect the current priorities of the system. This could be implemented as an ASm server, where the AS server maintains a set of weights wi.
Product Maximize Collective Happiness [PhD, §15.5.2], adapted from Grefenstette's work [Grefenstette, 1992], can be implemented on the WWM as an ASm server.
A number of other authors [Aylett, 1995, Tyrrell, 1993, Whitehead et al., 1993, Karlsson, 1997, Ono et al., 1996] implement, using a variety of notations, one of the 4 basic AS methods defined above [see PhD, §15.4]. Though none, as far as I am aware, have tried a Minimize the Worst Unhappiness strategy.
To reduce the number of server queries needed, the AS server may remember who won the state last time, and build up a table k(x) for the winner (or a(x) for compromise actions) so it does not have to run the competition again [PhD, §11]. Obviously this only works if the servers remain unchanged (they are not learning), and if the collection of servers remains unchanged (no new servers, or old ones leaving).
To continue that last point, the WWM server queries allow for a collection of Mind servers where new ones are added or old ones removed during the course of the run [PhD, §17.6]. The implementation of the "Add mind" and "Remove mind" commands in the MindAS server will then send a "Collection changed" message to all of its subsidiary Mind servers, to inform them that the competition has changed and they may have to re-learn their W-values.
Digney [Digney, 1996] defines Nested Q-learning, where each Mind in a collection is able to call on any of the others. Each Mind server has its own set of actions Qi(x,a) and a set of actions Qi(x,k) where action k means "do whatever server k wants to do" (as in Hierarchical Q-learning). Of course we already have in general that a MindM server can call other Mind servers. What is different here is:
In a WWM implementation, each Nested server has a list of Mind URLs, either hard-coded or passed to it at startup. So the Nested server looks like a MindAS server co-ordinating many Mind servers to make its decision. But of course it is not making the final decision. It is merely suggesting an action to the master MindAS server that coordinates the competition between the Nested servers themselves. When the master MindAS server is started up with a list of Mind servers, it passes the list to each of the servers.
Consider the number of server queries in a Nested WWM system. The master MindAS server is given x and asked for an action a. It sends x to each Mind server. Server i looks at its Q-values and either suggests an action directly, or returns the URL of some server j. Server j is not yet queried. We wait to see if server i can win the competition. (In fact, server j may already have been queried separately for its own action.) If server i wins, then we query server j and get an action, which may be "Do what server k does" and so on. As well as allowing the Nested server to return a Mind URL instead of an action, we also need the master server to tell it the Mind URL of the winner. Remember that in Hierarchical Q-learning, the master server needs to know who won, so it can put values on Q(x,i). But of course it knows itself who won. Here, the Nested server needs to know who won, so it can put values on Qi(x,k). So it needs to be told who won by the master server.
In the basic model we just described, all Mind servers can call all other Mind servers in the list. But in fact, the list could be different for each server. Each server could hard-code its own list of servers that it may call, similar to how any hand-written MindM server hard-codes its list of servers. One confusion would be, when we tell the server who won, and we pass it the URL of a server that is not in its list of possible servers to call.
[PhD, §18.1] also shows how some of the Nested servers might actually be outside the Action Selection competition, and simply wait to be called by a server that is in the competition. I call these "passive" servers. We have the same with hand-coded MindM servers, where some Mind servers may have to wait to be called by others. A server may be "passive" in one Society and at the same time "active" (i.e. the server is in the Action Selection loop) in a different Society.
Watkins [Watkins, 1989] defines a Feudal (or "slave") Q-learner as one that accepts commands of the form "Take me to state c". On the WWM, these Feudal Mind servers will be driven by other Mind servers that actually have preferences about what goal state to get to. In Watkins' system, the command is part of the current state. Using the notation (x,c),a -> (y,c) the slave will receive rewards for transitions of the form: (*,c),a -> (c,c) So the master server drives the slave server by explicitly altering the state for it. We do not have to change our definition of the server above. It receives x and produces a. It is just that the server driving it is constructing the state x rather than simply passing it on from above.
Another possibility is that the real state and the command are explicitly separated in the server query, which is what we allowed for with the additional MindFeu queries above. Using either of these approaches, the WWM model allows for Mind servers that provide a service of taking one to explicit goal states. e.g. Moore [Moore, 1990] has a concept of an explicit goal state, and Kaelbling's Hierarchical Distance to Goal (HDG) algorithm [Kaelbling, 1993] addresses the issue of giving the server new explicit goal states at run-time.
The Nested and Feudal models are combined in [PhD, Fig. 18.4] showing the general form of a Society of Mind based on Reinforcement Learning. It is suggested that Reinforcement Learning would be one of the most fruitful areas in which to begin implementing the WWM. That is, we shall begin with a sub-symbolic Society of Mind. Indeed, the whole model of a complex, overlapping, competing, duplicated Society of Mind that we have developed in this paper is based on the generalised form of a Society of Mind based on Reinforcement Learning.
Baum's Economy of Mind [Baum, 1996] has new Mind servers paying off the old ones to gain control. This can still be done through our model. The payments would be managed through the MindAS server, which receives payments through the W-value, and redistributes them through the "Inform it about winner" command.
So far we have only defined a protocol for conflict resolution where the AS server makes queries of the Minds for different numeric weights, e.g. "How much will you pay to stop this happening?". As discussed, we may need further protocols for more sophisticated, symbolic communication among Mind servers. We imagine that numeric weights will be easily generated by sub-symbolic Minds, and are harder to generate in symbolic Minds. This is because symbolic Minds often know what they want to do but not "how much" they want to do it. Sub-symbolic Minds, who prefer certain actions precisely because numbers for that action have somehow risen higher than numbers for other actions, may be able to say precisely "how much" they want to do something, and quantify how bad alternative actions would be [PhD, §5.2].
The Action Selection model may be sometimes limiting, though, in demanding that the Mind win the competition objectively. We cannot just say that "Mind M3 should always win when the World is in state Z". Instead, M3 has to win the competition in that state. The solution then is we can surround the Action Selection by a MindM server that ensures that M3 wins in state Z.
It may be that in the symbolic domain we will make a lot more use of MindM servers, and maybe even avoid Action Selection altogether. This might be a popular alternative to having Minds generate Weights to resolve competition. The drawback, of course, is that the MindM server needs a lot of intelligence. It needs to understand the goals of all the Mind servers. This relates to the "homunculus" problem, or the need for an intelligent headquarters, see [PhD, §5].
We now ask what actual technology should we use to implement the WWM queries. I suggest one overriding objective:
The server authors are interested in AI, not necessarily in networks. They may only know AI programming languages such as LISP. They may have never written a network application, and they may not want to learn. If we accept this criterion, then we should seek a lowest-common-denominator approach that will enable AI researchers to put their minds and worlds online with the minimum of delay. Ideally, we would have the following:
As [Bosak and Bray, 1999] put it: "schemes that rely on complex, direct program-to-program interaction have not worked well in practice, because they depend on a uniformity of processing that does not exist." [Paulos and Canny, 1996] make a strong call for a lowest-common-denominator approach in remote access to robots.
<xml> <query name="New run"> <data name="world run ID"> 40031 </data> <data name="world display URL"> http://worldserver/currentruns/40031.html </data> </query> </xml>
The mind server replies, assigning the client a unique run ID. The server writes to stdout:
<xml> <response name="New run"> <data name="mind run ID"> 5505 </data> <data name="mind display URL"> http://mindserver/currentruns/5505.html </data> </response> </xml>
The client uses this unique ID in each subsequent request:
<xml> <query name="Get action"> <data name="mind run ID"> 5505 </data> <data name="state"> x </data> </query> </xml>
The state is simply a series of text characters representing the state, terminated by the </data> tag. How to decode these characters is something that the servers have to agree among themselves. The mind server returns an action:
<xml> <response name="Get action"> <data name="action"> a </data> </response> </xml>
Again, the format of the action is something the servers have to agree on.
Though it has to be noted that the name "AIML" is currently being used in a much more restricted domain - It is being used by the ALICE chatbot project [alicebot.org] as a means of allowing users to define their own chatbots. In this case, XML is being used to define the program itself rather than the data being passed back and forth. A chatbot is only one form of many possible types of WWM server, so a name like "ChatML" or even "ElizaML" would have been much more appropriate than "AIML". But criticism of this point aside, one interesting thing about ALICE is that in trying to allow non-technical users construct entire servers, in some ways it takes 3rd party involvement even further than envisaged in this work.
The Swarm project, which aims to provide a standardised simulator for multi-agent worlds, has also recently moved to XML because of the restrictions inherent in forcing the use of any particular programming language [Daniels, 1999]. Again, XML is used as a way of defining the environment and agents and avoiding programming, rather than as a way of communicating between separate minds and worlds. [Noda et al., 1998] is probably the closest previous work to the WWM, using a client-server design, programming language independent, transmitting ASCII strings.
http://site/directory/program
All arguments (including the type of WWM request being sent) are passed in stdin.
It is no surprise that this is actually a well-known issue with CGI programs, and there are many approaches to it discussed in the CGI community, especially where scripts talk to large databases. How exactly to do it depends on the programming language and server used, and there is no standard solution.
It is not the job of the WWM to propose a solution to the problem of persistent CGI. WWM server authors should be able to use any of the technologies that the CGI community propose. Here we simply note that for an efficient WWM server the server author may need to learn some network programming, which is something we wanted to avoid. But this is not necessary to set up a WWM server at all, which can be done using the save-to-disk method.
The AI programmer might get the server online initially using save-to-disk, and maybe sometime later, after learning some network programming, convert to a persistent process. The client would not notice any difference (except a speed improvement).
It is also the same issue as: If the client never issues an "End run" command, can we time it out if we haven't received queries from it in a long time. This would be important to relinquish control of a shared resource, e.g. a robot [Stein, 1998]. This could still be client-driven: The old client gets timed out when the new client makes its "Start run" attempt.
There is a similar "quick and dirty" way of making an asynchronous world while still retaining the simple client-driven model: The next time the client makes a request, the server calculates the time since the last request, and runs the world forward that number of timesteps before replying. An alternative, but more complex way, would be for the server to have its own "clock-tick" client, which sends it a periodic clock tick using the "no op" command. Each time it receives a clock tick it can update the world.