Detecting heterogeneity in generalized linear modeling
View/Open
Cita com:
hdl:2117/100826
Document typeMaster thesis
Date2017-02
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
In classical model fitting techinques, such as traditional Multiple Linear Regression
models (MLR) or Generalized Linear Models (GLM), the assumption is that the
individuals come from homogeneous population. However, this condition may be
not necessarily met, as there may be many factors that influence the behaviour of
the individuals and therefore, biasing the model estimations.
For instance, let us consider that we want to study the salaries among a certain
set of individuals that come from relatively defined professional sector. The first
approach would be to collect all possible modeling variables and fit the model. But
it may happen that this could lead us to inaccurate estimations, since the salaries
can be driven differently according to gender, region, ethnicity, among others. These
variables are called segmentation variables and their number may grow very fast. In
this case arises a combinatorial problem giving many possibilities of how to group
those individuals.
Our main goal in this work, is to go deeper in this kind of problems, and present an
automatic solution to detect homogeneous segments among the heterogeneous population
in the GLM context. The PATHMOX methodology is a powerful method
proposed by Gastón (2009) [19] to automate the task of finding segments. The statistical
tests needed to guide the PATHMOX algorithm and discover the constructs
that differentiate those segments, are proposed by Lamberti (2015) [8].
First, we provide several solutions to detect heterogeneity, by means of moderating
variables as in Covariance Analysis or by means of comparison of coefficients using
parametric or non-parametric approaches, in section 2. Additionally, we present
the method to characterize classes or continuous response by taking into account
only segmentation variables in section 4. Then, we concentrate on the Generalized
Linear Modeling context to define the automatic heterogeneity detection method.
Then, we accurately present all the needed hypothesis test procedures in section 3.
Finally, we also carry out a quite extensive simulation studies and a real problem
application in sections 6 and 7, respectively.
DegreeMÀSTER UNIVERSITARI EN INNOVACIÓ I RECERCA EN INFORMÀTICA (Pla 2012)
Files | Description | Size | Format | View |
---|---|---|---|---|
123241.pdf | 896,5Kb | View/Open |