Does k-anonymous microaggregation affect machine-learned macrotrends?

Rodríguez Hoyos, Ana; Estrada Jiménez, José; Rebollo Monedero, David; Parra Arnau, Javier; Forné Muñoz, Jorge

doi:10.1109/ACCESS.2018.2834858

Visualitza/Obre

Published article (15,21Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Rodríguez Hoyos, Ana

Estrada Jiménez, José

Rebollo Monedero, David

Parra Arnau, Javier

Forné Muñoz, Jorge

Tipus de documentArticle

Data publicació2018-05-16

EditorInstitute of Electrical and Electronics Engineers (IEEE)

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

n the era of big data, the availability of massive amounts of information makes privacy protection more necessary than ever. Among a variety of anonymization mechanisms, microaggregation is a common approach to satisfy the popular requirement of k-anonymity in statistical databases. In essence, k-anonymous microaggregation aggregates quasi-identifiers to hide the identity of each data subject within a group of other k - 1 subjects. As any perturbative mechanism, however, anonymization comes at the cost of some information loss that may hinder the ulterior purpose of the released data, which very often is building machine-learning models for macrotrends analysis. To assess the impact of microaggregation on the utility of the anonymized data, it is necessary to evaluate the resulting accuracy of said models. In this paper, we address the problem of measuring the effect of k-anonymous microaggregation on the empirical utility of microdata. We quantify utility accordingly as the accuracy of classification models learned from microaggregated data, and evaluated over original test data. Our experiments indicate, with some consistency, that the impact of the de facto microaggregation standard (maximum distance to average vector) on the performance of machine-learning algorithms is often minor to negligible for a wide range of k for a variety of classification algorithms and data sets. Furthermore, experimental evidences suggest that the traditional measure of distortion in the community of microdata anonymization may be inappropriate for evaluating the utility of microaggregated data.

CitacióRodríguez-Hoyos, A., Estrada-Jimenez, J., Rebollo-Monedero, D., Parra-Arnau, J., Forne, J. Does k-anonymous microaggregation affect machine-learned macrotrends?. "IEEE access", 16 Maig 2018, vol. 6, p. 28258-28277.

URIhttp://hdl.handle.net/2117/121730

DOI10.1109/ACCESS.2018.2834858

ISSN2169-3536

Versió de l'editorhttps://ieeexplore.ieee.org/document/8360116/

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Published version.pdf	Published article	15,21Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Does k-anonymous microaggregation affect machine-learned macrotrends?

Visualitza/Obre

Explora