Approximate policy iteration using regularized Bellman residuals minimization

Esposito, Gennaro; Martín Muñoz, Mario

doi:10.1080/0952813X.2015.1024494

Visualitza/Obre

Approximate Policy Iteration using kernels (1,408Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Esposito, Gennaro

Martín Muñoz, Mario

Tipus de documentArticle

Data publicació2016

EditorTaylor & Francis

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

Reinforcement Learning (RL) provides a general methodology to solve complex uncertain decision problems, which are very challenging in many real-world applications. RL problem is modeled as a Markov Decision Process (MDP) deeply studied in the literature. We consider Policy Iteration (PI) algorithms for RL which iteratively evaluate and improve control policies. In handling problems with continuous states or in very large state spaces, generalization is mandatory. Generalization property of RL algorithms is an important factor to predict values for unexplored states. Candidates for value function approximation are Support Vector Regression (SVR) known to have good properties over the generalization ability. SVR has been used in batch frameworks in RL but, smart implementations of incremental exact SVR can extend SVR generalization ability to online RL where the expected reward from states change constantly with experience. Hence our online SVR is a novelty method which allows fast and good estimation of value function achieving RL objective very efficiently. Throughout simulation tests, the feasibility and usefulness of the proposed approach is demonstrated.

CitacióEsposito, G., Martin, M. Approximate policy iteration using regularized Bellman residuals minimization. "Journal of Experimental & Theoretical Artificial Intelligence", 2016, vol. 28, núm. 1-2, p. 3-12.

URIhttp://hdl.handle.net/2117/84681

DOI10.1080/0952813X.2015.1024494

Versió de l'editorhttp://www.tandfonline.com/doi/full/10.1080/0952813X.2015.1024494#.VS6nrJPcnv5

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
EspositoCCIA27Extended.pdf	Approximate Policy Iteration using kernels	1,408Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Approximate policy iteration using regularized Bellman residuals minimization

Visualitza/Obre

Explora