DSpace DSpace UPC
 Català   Castellano   English  

E-prints UPC >
Altres >
Enviament des de DRAC >

Empreu aquest identificador per citar o enllaçar aquest ítem: http://hdl.handle.net/2117/12093

Arxiu Descripció MidaFormat
agostini.pdf1,03 MBAdobe PDFThumbnail
Veure/Obrir

Citació: Agostini, A.G.; Celaya, E. Reinforcement learning with a Gaussian mixture model. A: International Joint Conference on Neural Networks. "2010 International Joint Conference on Neural Networks". Barcelona: 2010, p. 3485-3492.
Títol: Reinforcement learning with a Gaussian mixture model
Autor: Agostini, Alejandro Gabriel Veure Producció científica UPC; Celaya Llover, Enric Veure Producció científica UPC
Data: 2010
Tipus de document: Conference report
Resum: Recent approaches to Reinforcement Learning (RL) with function approximation include Neural Fitted Q Iteration and the use of Gaussian Processes. They belong to the class of fitted value iteration algorithms, which use a set of support points to fit the value-function in a batch iterative process. These techniques make efficient use of a reduced number of samples by reusing them as needed, and are appropriate for applications where the cost of experiencing a new sample is higher than storing and reusing it, but this is at the expense of increasing the computational effort, since these algorithms are not incremental. On the other hand, non-parametric models for function approximation, like Gaussian Processes, are preferred against parametric ones, due to their greater flexibility. A further advantage of using Gaussian Processes for function approximation is that they allow to quantify the uncertainty of the estimation at each point. In this paper, we propose a new approach for RL in continuous domains based on Probability Density Estimations. Our method combines the best features of the previous methods: it is non-parametric and provides an estimation of the variance of the approximated function at any point of the domain. In addition, our method is simple, incremental, and computationally efficient. All these features make this approach more appealing than Gaussian Processes and fitted value iteration algorithms in general.
URI: http://hdl.handle.net/2117/12093
DOI: 10.1109/IJCNN.2010.5596306
Versió de l'editor: http://dx.doi.org/10.1109/IJCNN.2010.5596306
Apareix a les col·leccions:Altres. Enviament des de DRAC
VIS - Visió Artificial i Sistemes Intel.ligents. Ponències/Comunicacions de congressos
Institut de Robòtica i Informàtica Industrial, CSIC-UPC. Ponències/Comunicacions de congressos
Comparteix:


Stats Mostra les estadístiques d'aquest ítem

SFX Query

Aquest ítem (excepte textos i imatges no creats per l'autor) està subjecte a una llicència de Creative Commons Llicència Creative Commons
Creative Commons

 

Valid XHTML 1.0! Programari DSpace Copyright © 2002-2004 MIT and Hewlett-Packard Comentaris
Universitat Politècnica de Catalunya. Servei de Biblioteques, Publicacions i Arxius