Classifying data from protected statistical datasets

Herranz Sotoca, Javier; Matwin, Stan; Nin Guerrero, Jordi; Torra i Reventós, Vicenç

doi:10.1016/j.cose.2010.05.005

dc.contributor.author	Herranz Sotoca, Javier
dc.contributor.author	Matwin, Stan
dc.contributor.author	Nin Guerrero, Jordi
dc.contributor.author	Torra i Reventós, Vicenç
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Matemàtica Aplicada IV
dc.date.accessioned	2012-01-05T13:01:13Z
dc.date.available	2012-01-05T13:01:13Z
dc.date.created	2010-06-09
dc.date.issued	2010-06-09
dc.identifier.citation	Herranz, J. [et al.]. Classifying data from protected statistical datasets. "Computers and security", 09 Juny 2010, vol. 29, núm. 8, p. 875-890.
dc.identifier.issn	0167-4048
dc.identifier.uri	http://hdl.handle.net/2117/14416
dc.description.abstract	Statistical Disclosure Control (SDC) is an active research area in the recent years. The goal is to transform an original dataset X into a protected one X0, such that X0 does not reveal any relation between confidential and (quasi-)identifier attributes and such that X0 can be used to compute reliable statistical information about X. Many specific protection methods have been proposed and analyzed, with respect to the levels of privacy and utility that they offer. However, when measuring utility, only differences between the statistical values of X and X0 are considered. This would indicate that datasets protected by SDC methods can be used only for statistical purposes. We show in this paper that this is not the case, because a protected dataset X0 can be used to construct good classifiers for future data. To do so, we describe an extensive set of experiments that we have run with different SDC protection methods and different (real) datasets. In general, the resulting classifiers are very good, which is good news for both the SDC and the Privacy-preserving Data Mining communities. In particular, our results question the necessity of some specific protection methods that have appeared in the privacy-preserving data mining (PPDM) literature with the clear goal of providing good classification.
dc.format.extent	16 p.
dc.language.iso	eng
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Spain
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject	Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic
dc.subject.lcsh	Data mining
dc.title	Classifying data from protected statistical datasets
dc.type	Article
dc.subject.lemac	Mineria de dades
dc.contributor.group	Universitat Politècnica de Catalunya. MAK - Matemàtica Aplicada a la Criptografia
dc.identifier.doi	10.1016/j.cose.2010.05.005
dc.subject.inspec	Classificació INSPEC::Cybernetics::Artificial intelligence::Knowledge engineering::Knowledge acquisition::Data mining
dc.rights.access	Open Access
local.identifier.drac	2593631
dc.relation.projectid	info:eu-repo/grantAgreement/EC/FP7/235226/EU/Anonymity Enhancement for Information Society/ENONYMITY
local.citation.author	Herranz, J.; Matwin, S.; Nin, J.; Torra, V.
local.citation.publicationName	Computers and security
local.citation.volume	29
local.citation.number	8
local.citation.startingPage	875
local.citation.endingPage	890

Fitxers d'aquest items

Nom:: Article_COSE_Herranz_Nin_postp ...
Mida:: 1,862Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Classifying data from protected statistical datasets

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora