Bayesian analysis of frequency count data

Font Valverde, Martí; Puig Oriol, Xavier; Ginebra Molins, Josep

doi:10.1080/00949655.2011.600311

dc.contributor.author	Font Valverde, Martí
dc.contributor.author	Puig Oriol, Xavier
dc.contributor.author	Ginebra Molins, Josep
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa
dc.date.accessioned	2012-01-25T12:11:32Z
dc.date.available	2012-01-25T12:11:32Z
dc.date.created	2011
dc.date.issued	2011
dc.identifier.citation	Font, M.; Puig, X.; Ginebra, J. Bayesian analysis of frequency count data. "Journal of statistical computation and simulation", 2011, p. 1-18.
dc.identifier.issn	0094-9655
dc.identifier.uri	http://hdl.handle.net/2117/14798
dc.description.abstract	The zero truncated inverse Gaussian–Poisson model, obtained by first mixing the Poisson model assuming its expected value has an inverse Gaussian distribution and then truncating the model at zero, is very useful when modelling frequency count data. A Bayesian analysis based on this statistical model is implemented on the word frequency counts of various texts, and its validity is checked by exploring the posterior distribution of the Pearson errors and by implementing posterior predictive consistency checks. The analysis based on this model is useful because it allows one to use the posterior distribution of the model mixing density as an approximation of the posterior distribution of the density of the word frequencies of the vocabulary of the author, which is useful to characterize the style of that author. The posterior distribution of the expectation and of measures of the variability of that mixing distribution can be used to assess the size and diversity of his vocabulary. An alternative analysis is proposed based on the inverse Gaussian-zero truncated Poisson mixture model, which is obtained by switching the order of the mixing and the truncation stages. Even though this second model fits some of the word frequency data sets more accurately than the first model, in practice the analysis based on it is not as useful because it does not allow one to estimate the word frequency distribution of the vocabulary.
dc.format.extent	18 p.
dc.language.iso	eng
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Spain
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject	Àrees temàtiques de la UPC::Matemàtiques i estadística::Estadística aplicada
dc.subject.lcsh	Bayesian statistical decision theory
dc.title	Bayesian analysis of frequency count data
dc.type	Article
dc.subject.lemac	Anàlisi de dades
dc.subject.lemac	Vocabulari -- Models estadístics
dc.contributor.group	Universitat Politècnica de Catalunya. GRESA - Grup de recerca en estadística aplicada
dc.identifier.doi	10.1080/00949655.2011.600311
dc.rights.access	Restricted access - publisher's policy
local.identifier.drac	8957862
dc.description.version	Postprint (published version)
local.citation.author	Font, M.; Puig, X.; Ginebra, J.
local.citation.publicationName	Journal of statistical computation and simulation
local.citation.startingPage	1
local.citation.endingPage	18

Fitxers d'aquest items

Nom:: 00949655.2011.pdf
Mida:: 738,5Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Articles de revista [24]
Articles de revista [719]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Bayesian analysis of frequency count data

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora