Probability density estimation of the Q Function for reinforcement learning

Agostini, Alejandro Gabriel; Celaya Llover, Enric

dc.contributor.author	Agostini, Alejandro Gabriel
dc.contributor.author	Celaya Llover, Enric
dc.contributor.other	Institut de Robòtica i Informàtica Industrial
dc.date.accessioned	2010-04-01T11:30:53Z
dc.date.available	2010-04-01T11:30:53Z
dc.date.issued	2009
dc.identifier.uri	http://hdl.handle.net/2117/6856
dc.description.abstract	Performing Q-Learning in continuous state-action spaces is a problem still unsolved for many complex applications. The Q function may be rather complex and can not be expected to fit into a predefined parametric model. In addition, the function approximation must be able to cope with the high non-stationarity of the estimated q values, the on-line nature of the learning with a strongly biased sampling to convergence regions, and the large amount of generalization required for a feasible implementation. To cope with these problems local, non-parametric function approximations seem more suitable than global parametric ones. A kind of function approximation that is gaining special interest in the field of machine learning are those based on densities. Estimating densities provides more information than simple function approximations which can be used to deal with the Reinforcement Learning problems. For instance, density estimation permits to know the actual distribution of the q values for any given state-action, and provides information about how many data has been collected in different regions of the domain. In this work we propose a Q-Learning approach for continuous state-action spaces based on joint density estimations. The density distribution is represented with a Gaussian Mixture Model using an on-line version of the Expectation-Maximization algorithm. We propose a method that handles the biased sampling problem with good performance. Experiments performed on a test problem show remarkable improvements over previous published results.
dc.language.iso	eng
dc.subject	Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic
dc.subject.lcsh	Pattern recognition systems
dc.subject.lcsh	Reinforcement learning
dc.title	Probability density estimation of the Q Function for reinforcement learning
dc.type	External research report
dc.subject.lemac	Reconeixement de formes (Informàtica)
dc.subject.lemac	Aprenentatge automàtic
dc.subject.ams	Classificació AMS::68 Computer science::68T Artificial intelligence
dc.subject.inspec	Classificació INSPEC::Pattern recognition
dc.rights.access	Open Access
local.identifier.drac	2169742
dc.description.version	Preprint
local.personalitzacitacio	true

Fitxers d'aquest items

Nom:: agostini.pdf
Mida:: 266,4Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Reports de recerca [50]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Probability density estimation of the Q Function for reinforcement learning

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora