A new deep reinforcement learning architecture for autonomous UAVs
Tutor / director / evaluatorBarrado Muxí, Cristina
Document typeBachelor thesis
Rights accessOpen Access
Recent improvements in computation and algorithmic research, together with the rising era of Big Data, have allowed Artificial Intelligence increase its popularity within masses. The recent publication of the Deep Q-Network (DQN) algorithm, which combines Q-learning with deep neural networks, has been demonstrated as being able to learn how to solve complex task, such as playing Atari games, in an unknown environment solely by gathering experience. These conditions open the door for many other applications, such as autonomous vehicles, doctors or production chains. Moreover, the preceding work of this project was focused on building a baseline architecture for enabling Unmanned Aerial Vehicles (UAVs) learn how to behave autonomously. In this project we provide different architectures for scaling this solution. To evaluate the convergence of the algorithm, we create challenging tasks concerning obstacle avoidance and goal position reaching inside a realistic simulated environment. The provided solution allows UAVs to autonomously move in three dimensions as well as controlling and modifying their velocities. Modifications in the architecture provide different approaches for learning, which are evaluated together with its training efficiency metrics and testing results. The development has been focused on integrating Deep Learning and Reinforcement Learning tools such as Keras and OpenAI Gym in order to build a modular and accessible framework capable of training and testing DRL models for autonomous UAVs within simulated environments. Results of the carried experiments show multiple enhancements compared to previous research and work, along with providing useful insights for potentially identified improvements. In this project, we have been able to successfully beat the existent baseline Double Deep Q-Learning architecture for autonomous UAVs, obtaining a 49% more of average reward and no collisions, on a non-trivial task within a realistic simulated environment.