On the measure and the estimation of the evenness and diversity of vocabulary
Rights accessRestricted access - publisher's policy
Modelling word or species frequency count data through zero truncated Poisson mixture models allows one to interpret the model mixing distribution as the distribution of the word or species frequencies of the vocabulary or population. As a consequence, estimates of their mixing density can be used as a fingerprint of the style of the author in his texts or of the ecosystem in its samples. Definitions of measure of the evenness and of measure of the diversity within a vocabulary or population are given, and the novelty of these definitions is explained. It is then proposed that the measures of the evenness and of the diversity of a vocabulary or population be approximated through the expectation of these measures under the word or species frequency distribution. That leads to the assessment of the lack of diversity through measures of the variability of the mixing frequency distribution estimates described above.
CitationGinebra, J.; Puig, X. On the measure and the estimation of the evenness and diversity of vocabulary. "Computational statistics and data analysis", 2010, vol. 54, núm. 9, p. 2187-2201.