Dr. Mark Humphrys

School of Computing. Dublin City University.

Online coding site: Ancient Brain

coders   JavaScript worlds

Search:

Free AI exercises


Research - PhD - Appendix A - Appendix B



B Bounds

B.1 Bounds with a learning rate α

Let D be updated by:

displaymath6289

where d is bounded by tex2html_wrap_inline9310 , tex2html_wrap_inline9312 , and the initial value of tex2html_wrap_inline6704 . Then:

theorem3220

Proof: The highest D can be is if it is always updated with tex2html_wrap_inline9310 :

displaymath9303

so tex2html_wrap_inline9322 . Similarly tex2html_wrap_inline9324. tex2html_wrap_inline7352

I should note this only works if α is between 0 and 1.



B.2 Bounds of Q-values

theorem3241

Proof: In the discrete case, Q is updated by:

displaymath6516

so by Theorem B.1:

displaymath9330

This can also be viewed in terms of temporal discounting:

displaymath9331

Similarly:

displaymath9332

tex2html_wrap_inline7352

For example, if tex2html_wrap_inline6480 , then tex2html_wrap_inline9342 . And (assuming tex2html_wrap_inline9344 ) as tex2html_wrap_inline9346 , tex2html_wrap_inline9348 .

Note that since tex2html_wrap_inline6426 , it follows that tex2html_wrap_inline9352 .




B.3 Bounds of W-values

theorem3309

Proof: In the discrete case, W is updated by:

displaymath9355

so by Theorem B.1:

displaymath9356

by Theorem B.2.

Similarly:

displaymath9357

tex2html_wrap_inline7352

Note that since tex2html_wrap_inline9352 , it follows that tex2html_wrap_inline7664 .



Appendix C

Return to Contents page.



ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.      New 250 G VPS server.

Note: Links on this site to user-generated content like Wikipedia are highlighted in red as possibly unreliable. My view is that such links are highly useful but flawed.