Information content versus word length in random typing

View/Open
Cita com:
hdl:2117/176067
Document typeArticle
Defense date2011-12
PublisherInstitute of Physics (IOP)
Rights accessOpen Access
Abstract
Recently, it has been claimed that a linear relationship between a measure of information content and word length is expected from word length optimization and it has been shown that this linearity is supported by a strong correlation between information content and word length in many languages (Piantadosi et al 2011 Proc. Nat. Acad. Sci. 108 3825). Here, we study in detail some connections between this measure and standard information theory. The relationship between the measure and word length is studied for the popular random typing process where a text is constructed by pressing keys at random from a keyboard containing letters and a space behaving as a word delimiter. Although this random process does not optimize word lengths according to information content, it exhibits a linear relationship between information content and word length. The exact slope and intercept are presented for three major variants of the random typing process. A strong correlation between information content and word length can simply arise from the units making a word (e.g., letters) and not necessarily from the interplay between a word and its context as proposed by Piantadosi and co-workers. In itself, the linear relation does not entail the results of any optimization process.
CitationFerrer-i-Cancho, R.; Moscoso del Prado, F. Information content versus word length in random typing. "Journal of statistical mechanics: Theory and experiment", Desembre 2011, vol. 2011, núm. 12, article L12002, p. 1-8.
ISSN1742-5468
Publisher versionhttp://dx.doi.org/10.1088/1742-5468/2011/12/L12002
Files | Description | Size | Format | View |
---|---|---|---|---|
Information_content.pdf | 115,5Kb | View/Open |
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder