An optimization-based decomposition heuristic for the microaggregation problem

dc.contributor.authorCastro Pérez, Jordi
dc.contributor.authorGentile, Claudio
dc.contributor.authorSpagnolo Arrizabalaga, Enrique
dc.contributor.groupUniversitat Politècnica de Catalunya. GNOM - Grup d'Optimització Numèrica i Modelització
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa
dc.date.accessioned2023-03-29T07:35:47Z
dc.date.available2023-03-29T07:35:47Z
dc.date.issued2022
dc.description.abstractGiven a set of points, the microaggregation problem aims to find a clustering with a minimum sum of squared errors (SSE), where the cardinality of each cluster is greater than or equal to k. Points in the cluster are replaced by the cluster centroid, thus satisfying k-anonymity. Microaggregation is considered one of the most effective techniques for numerical microdata protection. Traditionally, non-optimal solutions to the microaggregation problem are obtained by heuristic approaches. Recently, the authors of this paper presented a mixed integer linear optimization (MILO) approach based on column generation for computing tight solutions and lower bounds to the microaggregation problem. However, MILO can be computationally expensive for large datasets. In this work we present a new heuristic that combines three blocks: (1) a decomposition of the dataset into subsets, (2) the MILO column generation algorithm applied to each dataset in order to obtain a valid microaggregation, and (3) a local search improvement algorithm to get the final clustering. Preliminary computational results show that this approach was able to provide (and even improve upon) some of the best solutions (i.e., of smallest SSE) reported in the literature for the Tarragona and Census datasets, and k¿{3,5,10} .
dc.description.peerreviewedPeer Reviewed
dc.description.versionPostprint (author's final draft)
dc.format.extent12 p.
dc.identifier.citationCastro, J.; Gentile, C.; Spagnolo, E. An optimization-based decomposition heuristic for the microaggregation problem. A: Privacy in Statistical Databases. "Privacy in statistical databases: International Conference, PSD 2022, Paris, France, September 21-23, 2022, proceedings". Berlín: Springer, 2022, p. 3-14. ISBN 978-3-031-13945-1. DOI 10.1007/978-3-031-13945-1_1.
dc.identifier.doi10.1007/978-3-031-13945-1_1
dc.identifier.isbn978-3-031-13945-1
dc.identifier.otherhttp://www-eio.upc.edu/~jcastro/publications/papers/lncs2022.pdf
dc.identifier.urihttps://hdl.handle.net/2117/385663
dc.language.isoeng
dc.publisherSpringer
dc.relation.projectidinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-097580-B-I00/ES/MODELIZACION Y OPTIMIZACION DE PROBLEMAS ESTRUCTURADOS DE GRAN ESCALA Y APLICACIONES/
dc.relation.publisherversionhttps://link.springer.com/book/10.1007/978-3-031-13945-1
dc.rights.accessOpen Access
dc.subjectÀrees temàtiques de la UPC::Matemàtiques i estadística::Estadística matemàtica
dc.subjectÀrees temàtiques de la UPC::Matemàtiques i estadística::Investigació operativa::Programació matemàtica
dc.subject.amsClassificació AMS::60 Probability theory and stochastic processes::60D05 Geometric probability, stochastic geometry, random sets
dc.subject.amsClassificació AMS::90 Operations research, mathematical programming::90C Mathematical programming
dc.subject.lcshSampling (Statistics)
dc.subject.lcshProgramming (Mathematics)
dc.subject.lemacMostreig (Estadística)
dc.subject.lemacProgramació (Matemàtica)
dc.subject.otherStatistical disclosure control
dc.subject.otherMicrodata
dc.subject.otherMicroaggregation problem
dc.subject.otherMixed integer linear optimization
dc.subject.otherColumn generation
dc.subject.otherLocal search
dc.subject.otherHeuristics
dc.titleAn optimization-based decomposition heuristic for the microaggregation problem
dc.typeConference report
dspace.entity.typePublication
local.citation.authorCastro, J.; Gentile, C.; Spagnolo, E.
local.citation.contributorPrivacy in Statistical Databases
local.citation.endingPage14
local.citation.publicationNamePrivacy in statistical databases: International Conference, PSD 2022, Paris, France, September 21-23, 2022, proceedings
local.citation.pubplaceBerlín
local.citation.startingPage3
local.identifier.drac34249581

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
lncs2022.pdf
Mida:
345.26 KB
Format:
Adobe Portable Document Format
Descripció: