Reverse curriculum hierarchical recursive learning
Visualitza/Obre
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/371021
Tipus de documentProjecte Final de Màster Oficial
Data2022-06-29
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
This thesis presents a study on Hierarchical Reinforcement Learning, where several different approaches for learning are researched, developed and tested. Specifically, the algorithm Reverse Curriculum Vicinity Learning (RCVL) resulted with an excellent performance in the tested environments. It is built on two level hierarchies, where the high hierarchy learns to suggest adequate subgoals to the lower level recursively, the latter learns the sequence of needed primitive actions to achieve the subgoals, and finally the ultimate goal. Currently it is designed only for discrete Reinforcement Learning environments. RCVL algorithm has shown outperformance over the State of the Art algorithms: DDQN and DDQN combined with HER in more complex tested environment. It has reached to a success rate of above 97%, while avoiding unfeasible subgoals suggestion, and constructing optimal paths. Finally, it has shown to be robust to most of hyperparameters changes. Hierarchical Learning allows to break a task into several smaller sub-tasks, which results with faster learning, since smaller tasks are easier to master. Each of the levels in the hierarchy has its own "resolution" (i.e. different time scales) of the problem, while the low policy is the one to interact with the environment exclusively. In our problem the subgoals proposed by the high policy can be seen as milestones that break the big task into several shorter tasks. The proposed algorithm also integrates the concept of Reverse Curriculum Learning. Its learning begins from states around the goal, and gradually expands to more difficult tasks from further states, until mastering the whole state space. With this curricular approach, the agent is able to learn faster: first it masters the easy tasks, and then challenged with harder tasks. In the proposed algorithm the high hierarchy stores neighbours from the vicinity of each goal (collected by low hierarchy interactions) such that the goal is reachable from them with a limited number of actions. In the meantime, the low policy learns simple actions to solve the mini-trajectories from the neighbours to the goal. Then with the accumulation of knowledge of both hierarchies, the high policy learns to draw a path from the goal to the state backwards recursively, suggesting the subgoals along the way. By long-term return estimation learning, the agent is able to decide which is the best subgoal for each given pair of state and goal (or subgoal). More concepts are integrated in the algorithm to accelerate the learning and to allow sample efficiency. First, It is an off-policy algorithm. Secondly, the reward system is designed to exploit the maximum information when rolling-out the collected experience so that all possible ordered combinations are stored with a non-sparse reward. the algorithm uses Hindsight Experience Relabelling, allowing exploitation of the accumulated experience in a more efficient way.
MatèriesReinforcement learning, Artificial intelligence, Machine learning, Aprenentatge per reforç, Intel·ligència artificial, Aprenentatge automàtic
TitulacióMÀSTER UNIVERSITARI EN INTEL·LIGÈNCIA ARTIFICIAL (Pla 2017)
Col·leccions
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
170433.pdf | 2,451Mb | Visualitza/Obre |