Rights accessRestricted access - publisher's policy
(embargoed until 2017-01-31)
Reinforcement Learning (RL) provides a general methodology to solve complex uncertain decision problems, which are very challenging in many real-world applications. RL problem is modeled as a Markov Decision Process (MDP) deeply studied in the literature. We consider Policy Iteration (PI) algorithms for RL which iteratively evaluate and improve control policies. In handling problems with continuous states or in very large state spaces, generalization is mandatory. Generalization property of RL algorithms is an important factor to predict values for unexplored states. Candidates for value function approximation are Support Vector Regression (SVR) known to have good properties over the generalization ability. SVR has been used in batch frameworks in RL but, smart implementations of incremental exact SVR can extend SVR generalization ability to online RL where the expected reward from states change constantly with experience. Hence our online SVR is a novelty method which allows fast and good estimation of value function achieving RL objective very efficiently. Throughout simulation tests, the feasibility and usefulness of the proposed approach is demonstrated.
CitationEsposito, G., Martin, M. Approximate policy iteration using regularized Bellman residuals minimization. "Journal of Experimental & Theoretical Artificial Intelligence", 2016, vol. 28, núm. 1-2, p. 3-12.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder. If you wish to make any use of the work not provided for in the law, please contact: firstname.lastname@example.org