Designing mixture of deep experts
Visualitza/Obre
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/115295
Realitzat a/ambUniversité Laval
Tipus de documentProjecte Final de Màster Oficial
Data2017-03-07
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
Mixture of Experts (MoE) is a classical architecture for ensembles where each
member is specialised in a given part of the input space or its expertise area.
Working in this manner, we aim to specialise the experts on smaller problems,
solving the original problem through some type of divide and conquer approach.
The goal of our research is to initially reproduce the work done by Collobert et
al[1] , 2002 followed by extending this work by using neural networks as experts
on different datasets. Specialised representations will be learned over different
aspects of the problem, and the results of the different members will be merged
according to their specific expertise. This expertise can then be learned itself by a
given network acting as a gating function.
MOE architecture composed on N expert networks. These experts are combined
via a gating network, which partition the input space accordingly. It is based on
divide and conquer strategy supervised by a gating network. Using a specialised
cost function the experts specialise in their sub-space. Using the discriminative
power of experts is much better than simply clustering. The gating network needs
to needs to learn how to assign examples to different specialists.
Such models show promise for building larger networks that are still cheap to
compute at test time, and more parallelizable at training time. We were able to
reproduce the work by the author and implemented a multi-class gater to classify
images.
We know that Neural Networks perform the best with lots of data. However,
some of our experiments require us to divide the dataset and train multiple Neural
Networks. We observe that in data deprived condition our MoE are almost on
par and compete with ensembles trained on complete data.
Keywords : Machine Learning, Multi Layer Perceptrons, Mixture of Experts, Support
Vector Machines, Divide and Conquer, Stochastic Gradient Descent, Optimization.
TitulacióMÀSTER UNIVERSITARI EN INNOVACIÓ I RECERCA EN INFORMÀTICA (Pla 2012)
Localització
Col·leccions
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
127266.pdf | 2,587Mb | Visualitza/Obre |