Research - PhD - Appendix E - Appendix F

F Full list of action selection methods

We only consider here what numbers one might generate at a single timestep where some action is being taken. That is, we only use the Q-value an agent has for the executed action or the loss it is suffering. We omit methods that use terms that don't mean anything in this timestep, such as (from §10):

We also only consider maximizing, minimizing and summing. We omit other possible methods, such as product (§15.5.2) or standard deviation (§12.4).

Many methods below make no sense, such as maximizing unhappiness. Ones that make some sense are in bold.

F.1 Search for compromise action

Search for this, and take action a:

displaymath9652

displaymath9653

F.2 Use only suggested actions

Search for this, and take action

displaymath9658

displaymath9659

Search for this, and take action :

Other methods

In fact, we also omitted combinations of two of these terms. For instance, Tony Prescott suggested the following:

displaymath9661

That is, the gain by the winner, minus the losses it causes the others. Note that:

This method seems also to inhabit that desirable middle ground between Maximize the Best Happiness and the Collective methods. It somewhat answers the criticism of Maximize the Best Happiness, in that if the winner would make a gain no matter what action is taken, then it will prefer actions that cause smaller losses to the other agents. We see that it would successfully be opportunistic in the example in §9.1. And it is still more individual-driven than the pure Collective methods - it successfully picks in the first example in §12.3.

But we only have to increase the number of agents to show that it is still vulnerable to the criticisms of any Collective method. It will fail to pick here:

singlespace3823

The only other method of combining two terms that makes sense is:

displaymath9663

which again is just another Collective method.

Appendix G

Return to Contents page.