Identifying bias in cluster quality metrics

View/Open
Cita com:
hdl:2117/386014
Document typeResearch report
Defense date2021-12-12
Rights accessOpen Access
This work is protected by the corresponding intellectual and industrial property rights.
Except where otherwise noted, its contents are licensed under a Creative Commons license
:
Attribution 4.0 International
Abstract
We study potential biases of popular cluster quality metrics, such as conductance or modularity. We propose a method that uses both stochastic and preferential attachment block models construction to generate networks with preset community structures, to which quality metrics will be applied. These models also allow us to generate multi-level structures of varying strength, which will show if metrics favour partitions into a larger or smaller number of clusters. Additionally, we propose another quality metric, the density ratio.
We observed that most of the studied metrics tend to favour partitions into a smaller number of big clusters, even when their relative internal and external connectivity are the same. The metrics found to be less biased are modularity and density ratio.
CitationRenedo, M.; Arratia, A. Identifying bias in cluster quality metrics. 2021. DOI 10.48550/arXiv.2112.06287.
Other identifiershttps://arxiv.org/abs/2112.06287
Files | Description | Size | Format | View |
---|---|---|---|---|
2112.06287.pdf | 582,3Kb | View/Open |