DSpace DSpace UPC
 Català   Castellano   English  

E-prints UPC >
Altres >
Enviament des de DRAC >

Empreu aquest identificador per citar o enllaçar aquest ítem: http://hdl.handle.net/2117/6856

Arxiu Descripció MidaFormat
agostini.pdf266,42 kBAdobe PDFThumbnail
Veure/Obrir

Títol: Probability density estimation of the Q Function for reinforcement learning
Autor: Agostini, Alejandro Gabriel Veure Producció científica UPC; Celaya Llover, Enric Veure Producció científica UPC
Data: 2009
Tipus de document: External research report
Resum: Performing Q-Learning in continuous state-action spaces is a problem still unsolved for many complex applications. The Q function may be rather complex and can not be expected to fit into a predefined parametric model. In addition, the function approximation must be able to cope with the high non-stationarity of the estimated q values, the on-line nature of the learning with a strongly biased sampling to convergence regions, and the large amount of generalization required for a feasible implementation. To cope with these problems local, non-parametric function approximations seem more suitable than global parametric ones. A kind of function approximation that is gaining special interest in the field of machine learning are those based on densities. Estimating densities provides more information than simple function approximations which can be used to deal with the Reinforcement Learning problems. For instance, density estimation permits to know the actual distribution of the q values for any given state-action, and provides information about how many data has been collected in different regions of the domain. In this work we propose a Q-Learning approach for continuous state-action spaces based on joint density estimations. The density distribution is represented with a Gaussian Mixture Model using an on-line version of the Expectation-Maximization algorithm. We propose a method that handles the biased sampling problem with good performance. Experiments performed on a test problem show remarkable improvements over previous published results.
URI: http://hdl.handle.net/2117/6856
Apareix a les col·leccions:Institut de Robòtica i Informàtica Industrial, CSIC-UPC. Reports de recerca
Altres. Enviament des de DRAC
Comparteix:


Stats Mostra les estadístiques d'aquest ítem

SFX Query

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets.

Per a qualsevol ús que se'n vulgui fer no previst a la llei, dirigiu-vos a: sepi.bupc@upc.edu

 

Valid XHTML 1.0! Programari DSpace Copyright © 2002-2004 MIT and Hewlett-Packard Comentaris
Universitat Politècnica de Catalunya. Servei de Biblioteques, Publicacions i Arxius