An optimization-based decomposition heuristic for the microaggregation problem
| dc.contributor.author | Castro Pérez, Jordi |
| dc.contributor.author | Gentile, Claudio |
| dc.contributor.author | Spagnolo Arrizabalaga, Enrique |
| dc.contributor.group | Universitat Politècnica de Catalunya. GNOM - Grup d'Optimització Numèrica i Modelització |
| dc.contributor.other | Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa |
| dc.date.accessioned | 2023-03-29T07:35:47Z |
| dc.date.available | 2023-03-29T07:35:47Z |
| dc.date.issued | 2022 |
| dc.description.abstract | Given a set of points, the microaggregation problem aims to find a clustering with a minimum sum of squared errors (SSE), where the cardinality of each cluster is greater than or equal to k. Points in the cluster are replaced by the cluster centroid, thus satisfying k-anonymity. Microaggregation is considered one of the most effective techniques for numerical microdata protection. Traditionally, non-optimal solutions to the microaggregation problem are obtained by heuristic approaches. Recently, the authors of this paper presented a mixed integer linear optimization (MILO) approach based on column generation for computing tight solutions and lower bounds to the microaggregation problem. However, MILO can be computationally expensive for large datasets. In this work we present a new heuristic that combines three blocks: (1) a decomposition of the dataset into subsets, (2) the MILO column generation algorithm applied to each dataset in order to obtain a valid microaggregation, and (3) a local search improvement algorithm to get the final clustering. Preliminary computational results show that this approach was able to provide (and even improve upon) some of the best solutions (i.e., of smallest SSE) reported in the literature for the Tarragona and Census datasets, and k¿{3,5,10} . |
| dc.description.peerreviewed | Peer Reviewed |
| dc.description.version | Postprint (author's final draft) |
| dc.format.extent | 12 p. |
| dc.identifier.citation | Castro, J.; Gentile, C.; Spagnolo, E. An optimization-based decomposition heuristic for the microaggregation problem. A: Privacy in Statistical Databases. "Privacy in statistical databases: International Conference, PSD 2022, Paris, France, September 21-23, 2022, proceedings". Berlín: Springer, 2022, p. 3-14. ISBN 978-3-031-13945-1. DOI 10.1007/978-3-031-13945-1_1. |
| dc.identifier.doi | 10.1007/978-3-031-13945-1_1 |
| dc.identifier.isbn | 978-3-031-13945-1 |
| dc.identifier.other | http://www-eio.upc.edu/~jcastro/publications/papers/lncs2022.pdf |
| dc.identifier.uri | https://hdl.handle.net/2117/385663 |
| dc.language.iso | eng |
| dc.publisher | Springer |
| dc.relation.projectid | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-097580-B-I00/ES/MODELIZACION Y OPTIMIZACION DE PROBLEMAS ESTRUCTURADOS DE GRAN ESCALA Y APLICACIONES/ |
| dc.relation.publisherversion | https://link.springer.com/book/10.1007/978-3-031-13945-1 |
| dc.rights.access | Open Access |
| dc.subject | Àrees temàtiques de la UPC::Matemàtiques i estadística::Estadística matemàtica |
| dc.subject | Àrees temàtiques de la UPC::Matemàtiques i estadística::Investigació operativa::Programació matemàtica |
| dc.subject.ams | Classificació AMS::60 Probability theory and stochastic processes::60D05 Geometric probability, stochastic geometry, random sets |
| dc.subject.ams | Classificació AMS::90 Operations research, mathematical programming::90C Mathematical programming |
| dc.subject.lcsh | Sampling (Statistics) |
| dc.subject.lcsh | Programming (Mathematics) |
| dc.subject.lemac | Mostreig (Estadística) |
| dc.subject.lemac | Programació (Matemàtica) |
| dc.subject.other | Statistical disclosure control |
| dc.subject.other | Microdata |
| dc.subject.other | Microaggregation problem |
| dc.subject.other | Mixed integer linear optimization |
| dc.subject.other | Column generation |
| dc.subject.other | Local search |
| dc.subject.other | Heuristics |
| dc.title | An optimization-based decomposition heuristic for the microaggregation problem |
| dc.type | Conference report |
| dspace.entity.type | Publication |
| local.citation.author | Castro, J.; Gentile, C.; Spagnolo, E. |
| local.citation.contributor | Privacy in Statistical Databases |
| local.citation.endingPage | 14 |
| local.citation.publicationName | Privacy in statistical databases: International Conference, PSD 2022, Paris, France, September 21-23, 2022, proceedings |
| local.citation.pubplace | Berlín |
| local.citation.startingPage | 3 |
| local.identifier.drac | 34249581 |
Fitxers
Paquet original
1 - 1 de 1
Carregant...
- Nom:
- lncs2022.pdf
- Mida:
- 345.26 KB
- Format:
- Adobe Portable Document Format
- Descripció:



