On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Jaakkola, Tommi; Jordan, Michael I.; Singh, Satinder P.

dc.contributor.author	Jaakkola, Tommi	en_US
dc.contributor.author	Jordan, Michael I.	en_US
dc.contributor.author	Singh, Satinder P.	en_US
dc.date.accessioned	2004-10-20T20:49:46Z
dc.date.available	2004-10-20T20:49:46Z
dc.date.issued	1993-08-01	en_US
dc.identifier.other	AIM-1441	en_US
dc.identifier.other	CBCL-084	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/7205
dc.description.abstract	Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.	en_US
dc.format.extent	15 p.	en_US
dc.format.extent	77605 bytes
dc.format.extent	356324 bytes
dc.format.mimetype	application/octet-stream
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.relation.ispartofseries	AIM-1441	en_US
dc.relation.ispartofseries	CBCL-084	en_US
dc.subject	reinforcement learning	en_US
dc.subject	stochastic approximation	en_US
dc.subject	sconvergence	en_US
dc.subject	dynamic programming	en_US
dc.title	On the Convergence of Stochastic Iterative Dynamic Programming Algorithms	en_US

Files in this item

Name:: AIM-1441.ps.Z
Size:: 75.78Kb
Format:: Unknown

View/Open

Name:: AIM-1441.pdf
Size:: 347.9Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Show simple item record