Online reinforcement learning using a probability density estimation
Cita com:
hdl:2117/183344
Tipus de documentArticle
Data publicació2017-01-01
EditorThe MIT Press. Massachusetts Institute of Technology
Condicions d'accésAccés obert
Llevat que s'hi indiqui el contrari, els
continguts d'aquesta obra estan subjectes a la llicència de Creative Commons
:
Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya
Abstract
Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions.
CitacióAgostini, A.; Celaya, E. Online reinforcement learning using a probability density estimation. "Neural computation", 1 Gener 2017, vol. 29, núm. 1, p. 220-246.
ISSN0899-7667
Versió de l'editorhttp://www.mitpressjournals.org/doi/10.1162/NECO_a_00906#.WKVj2PLXsbs
Altres identificadorshttp://www.iri.upc.edu/download/scidoc/1794
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
1794-Online-Rei ... ity-Density-Estimation.pdf | 1,322Mb | Visualitza/Obre |