Market Basket analysis in Retail

Reig Grau, Gerard

dc.contributor	Sànchez-Marrè, Miquel
dc.contributor.author	Reig Grau, Gerard
dc.date.accessioned	2017-11-04T12:33:07Z
dc.date.available	2017-11-04T12:33:07Z
dc.date.issued	2017-05
dc.identifier.uri	http://hdl.handle.net/2117/109798
dc.description	En col·laboració amb la Universitat de Barcelona (UB) i la Universitat Rovira i Virgili (URV)
dc.description.abstract	This Master Thesis memory describes a full end-to-end data science project performed in CleverData, a successful start-up specialized in data mining techniques and analytics tools. This project was performed for one of its clients, which is an important retail company from Spain. The aim of the project was both the analysis of the possibly different selling behaviour of the stores or shops of the client and the analysis of customers’ purchase behaviour, also known as Market Basket Analysis, to confirm the hypotheses from the client regarding the existence of different customer purchasing profiles and different store selling profiles in its company. The project was divided in three tasks. The first one was oriented to the study, detection and validation of different behaviour profiles of the shops/stores of the client. This analysis was done by means of a descriptive process using clustering techniques. In order to guarantee a minimum robustness of the profiles obtained, three clustering algorithms were used: a hierarchical agglomerative clustering technique, a partitional clustering technique with a fixed number of clusters (Kmeans) and a partitional clustering technique with automatic detection of the number of clusters (G-means). For each algorithm, the output clusters were analysed and compared. First, the similarity of the composition of the clusters between algorithms was analysed. Secondly, the resulting clusters (each partition) from each method were structurally validated using four Clusters Validity Indexes (CVIs): Minimum Cluster Separation Index, Maximum Cluster Diameter Index, Dunn Index and Davies-Bouldin Index. Finally, the best partition was found from a technical point of view. After that, the client should be able to interpret and validate the meaning of the clusters obtained. Once chosen the partition more meaningful to the client, the second task was devoted to provide a descriptive analysis of the clusters as meaningful as possible to the client. To that end, some common techniques tools were used, as the computation of the centroids of the clusters, and the characterisation of each one of the clusters through the variables used. However, an important obstacle appeared in this task. The number of variables was so high (around 400) that made impossible that the client was able to analyse and summarise the selling behaviour profile of the different shops. The proposed solution was to apply a feature selection approach, taking advantage from the clustering process done, and to make an aggregation process of variables with temporal relationship. This way, the information about the cluster to which each store belonged, was recorded as a label of a new created class variable. Then, a Random Forest ensemble technique was selected and applied to the new dataset. This discriminant technique, in addition to be able to predict an unlabelled new instance or observation, provides information about the relevant attributes for the discrimination purpose (i.e., the ones being used in the trees of the forest). Then, based on those most important attributes, the descriptive analysis of each cluster was done, and it could be interpreted and fully understood by the client. The third task was focused on the analysis of customers’ purchase behaviour through the analysis of the historic purchase tickets recorded from one year. To identify possible different purchase patterns, it was decided to apply an associative model to find out whether some cooccurrences or associations could be identified. Concretely, the association rules model was used. Because the set of clusters was meaningful to the client, it was decided that the analysis of the purchase behaviour would be done locally to each cluster. Therefore, each cluster was examined to discover associations or co-occurrences of purchase patterns among the customers in each cluster. Hence, some association rules were discovered for the purchase patterns in each store. Two strategies were used to generate the rules: the Lift measure and the Leverage measure. To summarise and conclude the analysis, a web page was created where the results were published to make easier the access of the client to the results. Through the memory, it is gradually explained how the project was developed. Since the first step of defining the objectives, until the last results’ delivery. In the project, both the Python language and machine learning libraries were used, as well as the BigML tool, which uses machine learning as a service. At the end of the project, the results accomplished were analysed, and the aims of the project were compared against the initial goals of the project, with satisfactory results, both from the client practical point of view, and from a technical point of view.
dc.language.iso	eng
dc.publisher	Universitat Politècnica de Catalunya
dc.subject	Àrees temàtiques de la UPC::Informàtica
dc.subject.lcsh	Data mining
dc.subject.lcsh	Artificial intelligence
dc.subject.other	Market Basket Analysis
dc.subject.other	Association rules
dc.subject.other	unsupervised learning
dc.subject.other	supervised learning
dc.subject.other	Cluster Validation Indices
dc.subject.other	K-means
dc.subject.other	G-means
dc.subject.other	Hierarchical agglomerative clustering
dc.subject.other	Random Forests
dc.subject.other	Retail
dc.title	Market Basket analysis in Retail
dc.type	Master thesis
dc.subject.lemac	Mineria de dades
dc.subject.lemac	Intel·ligència artificial
dc.identifier.slug	129057
dc.rights.access	Open Access
dc.date.updated	2017-05-12T04:00:48Z
dc.audience.educationlevel	Màster
dc.audience.mediator	Facultat d'Informàtica de Barcelona
dc.audience.degree	MÀSTER UNIVERSITARI EN INTEL·LIGÈNCIA ARTIFICIAL (Pla 2012)

Fitxers d'aquest items

Nom:: 129057.pdf
Mida:: 3,360Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Master in Artificial Intelligence - MAI [278]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Market Basket analysis in Retail

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora