The Fast Maximum Distance to Average Vector (F-MDAV): an algorithm for k-Anonymous microaggregation in big data
Rodriguez_FastMDAV_EAAI_20200104.pdf (1,311Mb) (Restricted access) Request copy
Què és aquest botó?
Aquest botó permet demanar una còpia d'un document restringit a l'autor. Es mostra quan:
- Disposem del correu electrònic de l'autor
- El document té una mida inferior a 20 Mb
- Es tracta d'un document d'accés restringit per decisió de l'autor o d'un document d'accés restringit per política de l'editorial
Rights accessRestricted access - publisher's policy (embargoed until 2022-02-10)
The massive exploitation of tons of data is currently guiding critical decisions in domains such as economics or health. But serious privacy risks arise since personal data is commonly involved. k-Anonymous microaggregation is a well-known method that guarantees individuals’ privacy while preserving much of data utility. Unfortunately, methods like this are computationally expensive in big data settings, whereas the application domain of data might require an immediate response to make “life or death” decisions. Accordingly, this paper proposes five strategies to simplify the internal operations (such as distance calculations and element sorting) of the maximum distance to average vector method, the de facto microaggregation standard. For the sake of its usability in large-scale databases, they, e.g., reduce the number of operations necessary to compute distances from 3m to 2m, where m is the number of attributes of the data set. Also, the complexity of sorting operations gets reduced from O(n log n) to O(n) where n is the number of records. Through extensive experimentation over multiple data sets, we show that the new algorithm gets significantly faster. Interestingly, the speedup factor by each technique is not greater than 2, but the multiplicative effect of combining them all turns the algorithm four times faster than the original microaggregation mechanism. This remarkable speedup factor is achieved, literally, with no additional cost in terms of data utility, i.e., it does not incur greater information loss.
© <2019> Elsevier. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
CitationRodríguez-Hoyos, A. [et al.]. The Fast Maximum Distance to Average Vector (F-MDAV): an algorithm for k-Anonymous microaggregation in big data. "Engineering applications of artificial intelligence", 10 Febrer 2020, vol. 90, núm. April 2020, p. 103531:1-103531:12.