Multi-armed bandit (MAB) problem is derived from slot machines in the casino. It is about how a gambler could pull the arms in order to maximize total reward. In this sense, the gambler needs to decide which arm to explore in order to gain more knowledge, and which arm to exploit in order to guarantee the total payoff. This problem is also very common in real world, such as automatic content selection. The website is like a gambler. It needs to select proper content to recommend to the visitors, trying to maximize click through rate (CTR). Bandit algorithms are very suitable for this kind of issue. Because they are able to deal with exploration and exploitation trade-off with high churning data. When context is considered during content selection, we model it as contextual bandit problems. In this thesis, we evaluate several popular bandit algorithms in different bandit settings. And we propose our own approach to solve a real world automatic content selection case. Our experiments demonstrate that bandit algorithms are efficient in automatic content selection.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder. If you wish to make any use of the work not provided for in the law, please contact: firstname.lastname@example.org