Dr. Mark Humphrys

School of Computing. Dublin City University.

Online coding site: Ancient Brain

coders   JavaScript worlds

Search:

Online AI coding exercises

Project ideas


Exercise - long-term reward


What is wrong with γ = 1?

i.e. Just sum all future rewards.
Consider this situation:


State-space is not geographic-space

Note state-space is not geographic-space.
Transitions could be one-way, not 2-way. No easy way back to the state you left forever.
e.g. In chess, state = your queen is captured.

or e.g. You are Hitler. It is 1941. You have conquered Europe and are unthreatened.
So you decide to invade the Soviet Union and declare war on the US (neither of whom were at war with you so far).
There is no way back ever to that state before invasion. You have gone through a one-way door.



r=0, γ = anything

If take action a, expected long-term reward:
Ea(R) = 5 + γ 0 + γ2 0 + γ3 0 + ..
    = 5

Eb(R) = 0 + γ 100 + γ2 0 + γ3 0 + ..
    = 100 γ

  1. If γ = 0, it picks a.

  2. If γ = 1, it takes action b.

  3. Exercise - What is the smallest γ for which it picks b?



r=1, γ = anything

Imagine if the infinite loops scored 1 each time round instead of 0.

Ea(R) = 5 + γ + γ2 + γ3 + ..

Eb(R) = 0 + γ 100 + γ2 + γ3 + ..

  1. If γ = 0, it still picks a.

  2. If γ = 1
    Ea(R) = 5 + 1 + 1 + ..
    Eb(R) = 0 + 100 + 1 + 1 + ...
    Both infinite, can't tell the difference, will pick either one. Not necessarily b.

    Notes on infinity:
    If γ = 1, then even if one infinite loop gives 1 forever, and the other gives 100 forever, it can't tell the difference. Both infinite. In fact, all the following are the same:

    1 + 1 + 1 + 1 + ...
    -1000 -1000 -1000 -1000 -1000 + 1 + 1 + 1 + ...
    10 + 10 + 10 + 10 + ...
    For any γ < 1 we do not have this problem.

  3. Exercise - If γ < 1, for what γ will it pick b?



r=anything, γ = anything

If the infinite loops give reward r each time round:

Ea(R) = 5 + γ r + γ2 r + γ3 r + ..
    = 5 + γ r + r (γ2 + γ3 + .. )

Eb(R) = 0 + γ 100 + γ2 r + γ3 r + ..
    = 100 γ + r (γ2 + γ3 + .. )



ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.      New 250 G VPS server.

Note: Links on this site to user-generated content like Wikipedia are highlighted in red as possibly unreliable. My view is that such links are highly useful but flawed.