Reverse curriculum hierarchical recursive learning

Tahar, Tair

Visualitza/Obre

170433.pdf (2,451Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Tahar, Tair

Tutor / directorMartín Muñoz, Mario

Tipus de documentProjecte Final de Màster Oficial

Data2022-06-29

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

This thesis presents a study on Hierarchical Reinforcement Learning, where several different approaches for learning are researched, developed and tested. Specifically, the algorithm Reverse Curriculum Vicinity Learning (RCVL) resulted with an excellent performance in the tested environments. It is built on two level hierarchies, where the high hierarchy learns to suggest adequate subgoals to the lower level recursively, the latter learns the sequence of needed primitive actions to achieve the subgoals, and finally the ultimate goal. Currently it is designed only for discrete Reinforcement Learning environments. RCVL algorithm has shown outperformance over the State of the Art algorithms: DDQN and DDQN combined with HER in more complex tested environment. It has reached to a success rate of above 97%, while avoiding unfeasible subgoals suggestion, and constructing optimal paths. Finally, it has shown to be robust to most of hyperparameters changes. Hierarchical Learning allows to break a task into several smaller sub-tasks, which results with faster learning, since smaller tasks are easier to master. Each of the levels in the hierarchy has its own "resolution" (i.e. different time scales) of the problem, while the low policy is the one to interact with the environment exclusively. In our problem the subgoals proposed by the high policy can be seen as milestones that break the big task into several shorter tasks. The proposed algorithm also integrates the concept of Reverse Curriculum Learning. Its learning begins from states around the goal, and gradually expands to more difficult tasks from further states, until mastering the whole state space. With this curricular approach, the agent is able to learn faster: first it masters the easy tasks, and then challenged with harder tasks. In the proposed algorithm the high hierarchy stores neighbours from the vicinity of each goal (collected by low hierarchy interactions) such that the goal is reachable from them with a limited number of actions. In the meantime, the low policy learns simple actions to solve the mini-trajectories from the neighbours to the goal. Then with the accumulation of knowledge of both hierarchies, the high policy learns to draw a path from the goal to the state backwards recursively, suggesting the subgoals along the way. By long-term return estimation learning, the agent is able to decide which is the best subgoal for each given pair of state and goal (or subgoal). More concepts are integrated in the algorithm to accelerate the learning and to allow sample efficiency. First, It is an off-policy algorithm. Secondly, the reward system is designed to exploit the maximum information when rolling-out the collected experience so that all possible ordered combinations are stored with a non-sparse reward. the algorithm uses Hindsight Experience Relabelling, allowing exploitation of the accumulated experience in a more efficient way.

MatèriesReinforcement learning, Artificial intelligence, Machine learning, Aprenentatge per reforç, Intel·ligència artificial, Aprenentatge automàtic

TitulacióMÀSTER UNIVERSITARI EN INTEL·LIGÈNCIA ARTIFICIAL (Pla 2017)

URIhttp://hdl.handle.net/2117/371021

Col·leccions

Màsters oficials - Master in Artificial Intelligence - MAI [278]

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
170433.pdf		2,451Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Reverse curriculum hierarchical recursive learning

Visualitza/Obre

Explora