Approximate policy iteration using regularized Bellman residuals minimization
Tipus de documentArticle
EditorTaylor & Francis
Condicions d'accésAccés obert
Reinforcement Learning (RL) provides a general methodology to solve complex uncertain decision problems, which are very challenging in many real-world applications. RL problem is modeled as a Markov Decision Process (MDP) deeply studied in the literature. We consider Policy Iteration (PI) algorithms for RL which iteratively evaluate and improve control policies. In handling problems with continuous states or in very large state spaces, generalization is mandatory. Generalization property of RL algorithms is an important factor to predict values for unexplored states. Candidates for value function approximation are Support Vector Regression (SVR) known to have good properties over the generalization ability. SVR has been used in batch frameworks in RL but, smart implementations of incremental exact SVR can extend SVR generalization ability to online RL where the expected reward from states change constantly with experience. Hence our online SVR is a novelty method which allows fast and good estimation of value function achieving RL objective very efficiently. Throughout simulation tests, the feasibility and usefulness of the proposed approach is demonstrated.
CitacióEsposito, G., Martin, M. Approximate policy iteration using regularized Bellman residuals minimization. "Journal of Experimental & Theoretical Artificial Intelligence", 2016, vol. 28, núm. 1-2, p. 3-12.