Reinforcement learning applied as a game mechanic and design element in a custom boss for World of Warcraft
Tutor / director / evaluatorMartín Muñoz, Mario
Document typeMaster thesis
Rights accessOpen Access
Reinforcement Learning is one of the main categories of Machine Learning algorithms and has seen many applications throughout a wide variety of elds. On the other side, the industry of video games has been in a continuous growth for many years. The world of video games o ers many diverse environments that can be used to develop and research new and powerful algorithms. Moreover, in these last years very interesting projects have been testing the current limits of the application of Reinforcement Learning within com- plex game environments. In this project, I propose the usage of Reinforcement Learning as one of the main features of the AI system of a boss ght within the massively multi- player online game: World of Warcraft. The project is focused into taken into account this AI approach as a relevant element during the design of a boss encounter. The idea is to be able to apply Reinforcement Learning to allow the NPC to make a better use of its available mechanics, allowing it to surprise players while still being able to lose the ght. To do that, I design and implement a complex ght with various mechanics for a 15man raid, where the RL AI will have control over the choice of which mechanic to execute during the 2 main stages of the ght. The project is done using an MMORPG framework based on TrinityCore, built in C++. The ML aspect is implemented using Python scripts, adapting an implementation of the current state-of-the-art Rainbow algorithm, as well as including the capability of asynchronous learning. Both systems are put together using the CPython libraries availables within the C language. The experiments are done in an online World of Warcraft: cataclysm private server, with an average of 100 players online. The results indicate that, using the corresponding architecture and resources, is viable to apply RL to the MMORPG environment of World of Warcraft. Is also possible to design a RL AI model that respects the constraints established and can modify its behaviour in order to gain the best performance out of its actions. The application of asynchronous Rainbow worked correctly and allowed the agent to be trained based on the corresponding parameters within a parallel environment that could have many agents within their own instance of the environment. The usage of a prede ned behaviour through scripting to train the network in early stages was useful to keep the early choices under control and train all the necessary actions considering the constraints. There is still margin to make more complex and larger types of researches using RL and MMORPG games, together with the application of other di erent approaches.