Dr. Mark Humphrys

School of Computing. Dublin City University.

Online coding site: Ancient Brain

Research - PhD - Appendix A - Appendix B

B Bounds

B.1 Bounds with a learning rate α

Let D be updated by:

where d is bounded by , , and the initial value of . Then:

Proof: The highest D can be is if it is always updated with :

displaymath9303

so . Similarly .

I should note this only works if α is between 0 and 1.

B.2 Bounds of Q-values

theorem3241

Proof: In the discrete case, Q is updated by:

so by Theorem B.1:

displaymath9330

This can also be viewed in terms of temporal discounting:

displaymath9331

Similarly:

displaymath9332

For example, if , then . And (assuming ) as , .

Note that since , it follows that .

B.3 Bounds of W-values

theorem3309

Proof: In the discrete case, W is updated by:

so by Theorem B.1:

displaymath9356

by Theorem B.2.

Similarly:

Note that since , it follows that .

Appendix C

Return to Contents page.

On the Internet since 1987. New 250 G VPS server.

Note: Links on this site to user-generated content like Wikipedia are highlighted in red as possibly unreliable. My view is that such links are highly useful but flawed.