Preserving empirical data utility in k-anonymous microaggregation via linear discriminant analysis

Rodríguez Hoyos, Ana Fernanda; Rebollo Monedero, David; Estrada Jiménez, José Antonio; Forné Muñoz, Jorge; Urquiza Aguiar, Luis Felipe

doi:10.1016/j.engappai.2020.103787

Visualitza/Obre

Rodriguez - LDA 202006.pdf (1,858Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Rodríguez Hoyos, Ana Fernanda

Rebollo Monedero, David

Estrada Jiménez, José Antonio

Forné Muñoz, Jorge

Urquiza Aguiar, Luis Felipe

Tipus de documentArticle

Data publicació2020-09-01

EditorElsevier

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

Today’s countless benefits of exploiting data come with a hefty price in terms of privacy. -Anonymous microaggregation is a powerful technique devoted to revealing useful demographic information of microgroups of people, whilst protecting the privacy of individuals therein. Evidently, the inherent distortion of data results in the degradation of its utility. This work proposes and analyzes an anonymization method that draws upon the technique of linear discriminant analysis (LDA), with the aim of preserving the empirical utility of data. Further, this utility is measured as the accuracy of a machine learning model trained on the microaggregated data. By transforming the original data records to a different data space, LDA enables -anonymous microaggregation to build microcells more tailored to an intrinsic classification threshold. To do this, first, data is rotated (projected) towards the direction of maximum discrimination and, second, scaled in this direction by a factor that penalizes distortion across the classification threshold. The upshot is that thinner cells are built along the threshold, which ends up preserving data utility in terms of the accuracy of machine learned models for a number of standardized data sets.

Descripció

CitacióRodríguez-Hoyos, A. [et al.]. Preserving empirical data utility in k-anonymous microaggregation via linear discriminant analysis. "Engineering applications of artificial intelligence", 1 Setembre 2020, vol. 94, p. 103787:1-103787:13.

URIhttp://hdl.handle.net/2117/330076

DOI10.1016/j.engappai.2020.103787

ISSN0952-1976

Versió de l'editorhttps://www.sciencedirect.com/science/article/abs/pii/S0952197620301792

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Rodriguez - LDA 202006.pdf		1,858Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Preserving empirical data utility in k-anonymous microaggregation via linear discriminant analysis

Visualitza/Obre

Explora