Show simple item record

dc.contributor.authorJaakkola, Tommien_US
dc.contributor.authorJordan, Michael I.en_US
dc.contributor.authorSingh, Satinder P.en_US
dc.date.accessioned2004-10-20T20:49:46Z
dc.date.available2004-10-20T20:49:46Z
dc.date.issued1993-08-01en_US
dc.identifier.otherAIM-1441en_US
dc.identifier.otherCBCL-084en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/7205
dc.description.abstractRecent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.en_US
dc.format.extent15 p.en_US
dc.format.extent77605 bytes
dc.format.extent356324 bytes
dc.format.mimetypeapplication/octet-stream
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.relation.ispartofseriesAIM-1441en_US
dc.relation.ispartofseriesCBCL-084en_US
dc.subjectreinforcement learningen_US
dc.subjectstochastic approximationen_US
dc.subjectsconvergenceen_US
dc.subjectdynamic programmingen_US
dc.titleOn the Convergence of Stochastic Iterative Dynamic Programming Algorithmsen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record