Correspondence analysis of textual data involving contextual information: CA-GALT on principal components
Rights accessRestricted access - publisher's policy
Correspondence analysis on an aggregated lexical table is a typical practice in textual analysis in which a contextual categorical variable is used to aggregate documents, depending on the categories to which they belong. This work generalises this approach and considers several quantitative, categorical or mixed contextual variables. The result is a new method that we have called 'correspondence analysis on a generalised aggregated lexical table'. A favoured application derives from surveys by questionnaire, including both open-ended and closed questions. The free-text answers are encoded into a respondents words frequency table called a lexical table. The closed questions, either quantitative or categorical, form the contextual variables. The primary objective is to establish a typology of the variables and a typology of the words from their mutual relationships as grasped from jointly analysing the textual and contextual tables. Validation tests are offered, particularly in the form of confidence ellipses. The comprehensive and numerous properties of the method, similar to correspondence analysis properties, are detailed. Promising results are obtained as indicated by an application to a marketing survey conducted among 1,000 respondents.
CitationBecue-Bertaut, M., Pages, J. Correspondence analysis of textual data involving contextual information: CA-GALT on principal components. "Advances in data analysis and classification", 01 Juny 2015, vol. 9, núm. 2, p. 125-142.