Two regimes in the frequency of words and the origin of complex lexicons: Zipf's law revisited
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
Zipf's law states that the frequency of a word is a power function of its rank. The exponent of the power is usually accepted to be close to (-)1. Great deviations between the predicted and real number of different words of a text, disagreements between the predicted and real exponent of the probability density function and statistics on a big corpus, make evident that word frequency as a function of the rank follows two different exponents, ~(-)1 for the first regime and ~(-)2 for the second. The implications of the change in exponents for the metrics of texts and for the origins of complex lexicons are analyzed.
CitationFerrer-i-Cancho, R.; Solé, R. V. Two regimes in the frequency of words and the origin of complex lexicons: Zipf's law revisited. "Journal of quantitative linguistics", Agost 2001, vol. 8, núm. 3, p. 165-173.