Outlier detection for multivariate categorical data
Rights accessOpen Access
The detection of outlying rows in a contingency table is tackled from a Bayesian perspective, by adapting the framework adopted by Box and Tiao for normal models to multinomial models with random effects. The solution assumes a 2–component mixture model of 2 multinomial continuous mixtures for them, one for the nonoutlier rows and the second one for the outlier rows. The method starts by estimating the distributional characteristics of nonoutlier rows, and then it does cluster analysis to identify which rows belong to the outlier group and which do not. The method applies to any type of contingency table, and in particular, it could be used on the analysis of multivariate categorical control charts. Here, the use of the method is illustrated through a simulated example and by applying it to help identify heterogeneities of style among the acts in the plays of the First Folio edition of Shakespeare drama
This is an Accepted Manuscript of an article published by Taylor & Francis in “ Quality and Reliability Engineering International ” on 06th June 2018, available online: https://onlinelibrary.wiley.com/doi/abs/10.1002/qre.2339
CitationPuig, X., Ginebra, J. Outlier detection for multivariate categorical data. "Quality and reliability engineering international", 2018, vol. 34, núm. 7, p. 1400-1412.