Extending the ROCKET Machine Learning algorithm to improve Multivariate Time Series classification

View/Open
Cita com:
hdl:2117/420920
CovenanteeKungliga Tekniska högskolan
Document typeMaster thesis
Date2024-02-09
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
While the norm in Time Series Classification (TSC) has been to improve accuracy, new models focusing on efficiency have recently been attracting attention. In particular, models known as "ROCKET", which work by randomly generating a large number of kernels used as feature extractors to train a simple ridge classifier, can yield results as good as other state-of-the-art algorithms while presenting a significant increase in efficiency. Although ROCKET models were originally designed for Univariate Time Series (UTS), which are defined by a single channel or sequence, these classifiers have also shown excellent results when tested on Multivariate Time Series (MTS), where the characteristics of the time series are spread across multiple channels. Therefore, it is of scientific interest to explore these models to assess their overall performance and whether efficiency can be further improved. Recent studies present a novel algorithm named Sequential Feature Detachment (SFD) which, on top of ROCKET, can significantly reduce the model size while slightly increasing accuracy through a sequential feature selection technique. Despite these remarkable results, the experiments leading to the conclusions were limited to the use of UTS, leaving room for the exploration of this algorithm on MTS. Consequently, this thesis evaluates different strategies to implement ROCKET and SFD algorithms for MTS classification tasks, focusing not only on improving efficiency and accuracy, but also on adding interpretability to the classifier. To achieve this, experiments were conducted by testing model ensembles, grouping channels based on predictability, and examining channel relevances alongside SFD. The University of East Anglia (UEA) MTS archive was used to evaluate the resulting models, as it is common with TSC algorithms. The results demonstrate that model ensembling does not increase accuracy in the test sets and that the predictability of individual channels is not maintained across dataset splits. However, the study shows that using SFD with MiniROCKET, a variant of ROCKET that includes random channel combinations, not only can improve classification results but also provide a statistically significant channel relevance measure.
SubjectsTime-series analysis, Multivariate analysis--Computer programs, Sèries temporals--Anàlisi, Anàlisi multivariable--Programes d'ordinador
DegreeMÀSTER UNIVERSITARI EN TECNOLOGIES AVANÇADES DE TELECOMUNICACIÓ (Pla 2019)
Files | Description | Size | Format | View |
---|---|---|---|---|
Solana_Adria_Thesis.pdf | 1,259Mb | View/Open |