Articles de revista
http://hdl.handle.net/2117/3529
Fri, 27 Nov 2015 06:30:38 GMT
20151127T06:30:38Z

Secure and efficient anonymization of distributed confidential databases
http://hdl.handle.net/2117/76549
Secure and efficient anonymization of distributed confidential databases
Herranz Sotoca, Javier; Nin Guerrero, Jordi
Let us consider the following situation: t entities (e.g., hospitals) hold different databases containing different records for the same type of confidential (e.g., medical) data. They want to deliver a protected version of this data to third parties (e.g., pharmaceutical researchers), preserving in some way both the utility and the privacy of the original data. This can be done by applying a statistical disclosure control (SDC) method. One possibility is that each entity protects its own database individually, but this strategy provides less utility and privacy than a collective strategy where the entities cooperate, by means of a distributed protocol, to produce a global protected dataset. In this paper, we investigate the problem of distributed protocols for SDC protection methods. We propose a simple, efficient and secure distributed protocol for the specific SDC method of rank shuffling. We run some experiments to evaluate the quality of this protocol and to compare the individual and collective strategies for solving the problem of protecting a distributed database. With respect to other distributed versions of SDC methods, the new protocol provides either more security or more efficiency, as we discuss through the paper.
Wed, 02 Sep 2015 08:12:33 GMT
http://hdl.handle.net/2117/76549
20150902T08:12:33Z
Herranz Sotoca, Javier
Nin Guerrero, Jordi
Let us consider the following situation: t entities (e.g., hospitals) hold different databases containing different records for the same type of confidential (e.g., medical) data. They want to deliver a protected version of this data to third parties (e.g., pharmaceutical researchers), preserving in some way both the utility and the privacy of the original data. This can be done by applying a statistical disclosure control (SDC) method. One possibility is that each entity protects its own database individually, but this strategy provides less utility and privacy than a collective strategy where the entities cooperate, by means of a distributed protocol, to produce a global protected dataset. In this paper, we investigate the problem of distributed protocols for SDC protection methods. We propose a simple, efficient and secure distributed protocol for the specific SDC method of rank shuffling. We run some experiments to evaluate the quality of this protocol and to compare the individual and collective strategies for solving the problem of protecting a distributed database. With respect to other distributed versions of SDC methods, the new protocol provides either more security or more efficiency, as we discuss through the paper.

New results and applications for multisecret sharing schemes
http://hdl.handle.net/2117/27633
New results and applications for multisecret sharing schemes
Herranz Sotoca, Javier; Ruiz Rodríguez, Alexandre; Sáez Moreno, Germán
In a multisecret sharing scheme (MSSS), different secrets are distributed among the players in some set , each one according to an access structure. The trivial solution to this problem is to run independent instances of a standard secret sharing scheme, one for each secret. In this solution, the length of the secret share to be stored by each player grows linearly with (when keeping all other parameters fixed). Multisecret sharing schemes have been studied by the cryptographic community mostly from a theoretical perspective: different models and definitions have been proposed, for both unconditional (informationtheoretic) and computational security. In the case of unconditional security, there are two different definitions. It has been proved that, for some particular cases of access structures that include the threshold case, a MSSS with the strongest level of unconditional security must have shares with length linear in . Therefore, the optimal solution in this case is equivalent to the trivial one. In this work we prove that, even for a more relaxed notion of unconditional security, and for some kinds of access structures (in particular, threshold ones), we have the same efficiency problem: the length of each secret share must grow linearly with . Since we want more efficient solutions, we move to the scenario of MSSSs with computational security. We propose a new MSSS, where each secret share has constant length (just one element), and we formally prove its computational security in the random oracle model. To the best of our knowledge, this is the first formal analysis on the computational security of a MSSS. We show the utility of the new MSSS by using it as a key ingredient in the design of two schemes for two new functionalities: multipolicy signatures and multipolicy decryption. We prove the security of these two new multipolicy cryptosystems in a formal security model. The two new primitives provide similar functionalities as attributebased cryptosystems, with some advantages and some drawbacks that we discuss at the end of this work.
Tue, 28 Apr 2015 17:42:09 GMT
http://hdl.handle.net/2117/27633
20150428T17:42:09Z
Herranz Sotoca, Javier
Ruiz Rodríguez, Alexandre
Sáez Moreno, Germán
In a multisecret sharing scheme (MSSS), different secrets are distributed among the players in some set , each one according to an access structure. The trivial solution to this problem is to run independent instances of a standard secret sharing scheme, one for each secret. In this solution, the length of the secret share to be stored by each player grows linearly with (when keeping all other parameters fixed). Multisecret sharing schemes have been studied by the cryptographic community mostly from a theoretical perspective: different models and definitions have been proposed, for both unconditional (informationtheoretic) and computational security. In the case of unconditional security, there are two different definitions. It has been proved that, for some particular cases of access structures that include the threshold case, a MSSS with the strongest level of unconditional security must have shares with length linear in . Therefore, the optimal solution in this case is equivalent to the trivial one. In this work we prove that, even for a more relaxed notion of unconditional security, and for some kinds of access structures (in particular, threshold ones), we have the same efficiency problem: the length of each secret share must grow linearly with . Since we want more efficient solutions, we move to the scenario of MSSSs with computational security. We propose a new MSSS, where each secret share has constant length (just one element), and we formally prove its computational security in the random oracle model. To the best of our knowledge, this is the first formal analysis on the computational security of a MSSS. We show the utility of the new MSSS by using it as a key ingredient in the design of two schemes for two new functionalities: multipolicy signatures and multipolicy decryption. We prove the security of these two new multipolicy cryptosystems in a formal security model. The two new primitives provide similar functionalities as attributebased cryptosystems, with some advantages and some drawbacks that we discuss at the end of this work.

On the representability of the biuniform matroid
http://hdl.handle.net/2117/24101
On the representability of the biuniform matroid
Ball, Simeon Michael; Padró Laimon, Carles; Weiner, Zsuzsa; Xing, Chaoping
Every biuniform matroid is representable over all sufficiently large fields. But it is not known exactly over which finite fields they are representable, and the existence of efficient methods to find a representation for every given biuniform matroid has not been proved. The interest of these problems is due to their implications to secret sharing. The existence of efficient methods to find representations for all biuniform matroids is proved here for the first time. The previously known efficient constructions apply only to a particular class of biuniform matroids, while the known general constructions were not proved to be efficient. In addition, our constructions provide in many cases representations over smaller finite fields.
© 2013, Society for Industrial and Applied Mathematics
Thu, 18 Sep 2014 16:05:12 GMT
http://hdl.handle.net/2117/24101
20140918T16:05:12Z
Ball, Simeon Michael
Padró Laimon, Carles
Weiner, Zsuzsa
Xing, Chaoping
Every biuniform matroid is representable over all sufficiently large fields. But it is not known exactly over which finite fields they are representable, and the existence of efficient methods to find a representation for every given biuniform matroid has not been proved. The interest of these problems is due to their implications to secret sharing. The existence of efficient methods to find representations for all biuniform matroids is proved here for the first time. The previously known efficient constructions apply only to a particular class of biuniform matroids, while the known general constructions were not proved to be efficient. In addition, our constructions provide in many cases representations over smaller finite fields.
© 2013, Society for Industrial and Applied Mathematics

Cropping Euler factors of modular Lfunctions
http://hdl.handle.net/2117/20759
Cropping Euler factors of modular Lfunctions
González Rovira, Josep; Jiménez Urroz, Jorge; Lario Loyo, Joan Carles
According to the Birch and SwinnertonDyer conjectures, if A/Q is an abelian variety, then its Lfunction must capture a substantial part of the properties of A. The smallest number field L where A has all its endomorphisms defined must also play a role. This article deals with the relationship between these two objects in the specific case of modular abelian varieties Af =Q associated to weight 2 newforms for the group t1(N). Specifically, our goal is to relate ords=1 L(Af =Q, s), with the order at s D 1 of Euler products restricted to primes that split completely in L. This is attained when a power of Af is isogenous over Q to the Weil restriction of the building block of Af . We give separated formulae for the CM and nonCM cases.
Mon, 25 Nov 2013 17:05:43 GMT
http://hdl.handle.net/2117/20759
20131125T17:05:43Z
González Rovira, Josep
Jiménez Urroz, Jorge
Lario Loyo, Joan Carles
According to the Birch and SwinnertonDyer conjectures, if A/Q is an abelian variety, then its Lfunction must capture a substantial part of the properties of A. The smallest number field L where A has all its endomorphisms defined must also play a role. This article deals with the relationship between these two objects in the specific case of modular abelian varieties Af =Q associated to weight 2 newforms for the group t1(N). Specifically, our goal is to relate ords=1 L(Af =Q, s), with the order at s D 1 of Euler products restricted to primes that split completely in L. This is attained when a power of Af is isogenous over Q to the Weil restriction of the building block of Af . We give separated formulae for the CM and nonCM cases.

More hybrid and secure protection of statistical data sets
http://hdl.handle.net/2117/17412
More hybrid and secure protection of statistical data sets
Herranz Sotoca, Javier; Nin Guerrero, Jordi; Solé Simó, Marc
Different methods and paradigms to protect data sets containing sensitive statistical information have been proposed and
studied. The idea is to publish a perturbed version of the data set that does not leak confidential information, but that still allows users
to obtain meaningful statistical values about the original data. The two main paradigms for data set protection are the classical one and
the synthetic one. Recently, the possibility of combining the two paradigms, leading to a hybrid paradigm, has been considered. In this
work, we first analyze the security of some synthetic and (partially) hybrid methods that have been proposed in the last years, and we
conclude that they suffer from a high interval disclosure risk. We then propose the first fully hybrid SDC methods; unfortunately, they
also suffer from a quite high interval disclosure risk. To mitigate this, we propose a postprocessing technique that can be applied to any
data set protected with a synthetic method, with the goal of reducing its interval disclosure risk. We describe through the paper a set of
experiments performed on reference data sets that support our claims
Thu, 17 Jan 2013 18:24:07 GMT
http://hdl.handle.net/2117/17412
20130117T18:24:07Z
Herranz Sotoca, Javier
Nin Guerrero, Jordi
Solé Simó, Marc
Different methods and paradigms to protect data sets containing sensitive statistical information have been proposed and
studied. The idea is to publish a perturbed version of the data set that does not leak confidential information, but that still allows users
to obtain meaningful statistical values about the original data. The two main paradigms for data set protection are the classical one and
the synthetic one. Recently, the possibility of combining the two paradigms, leading to a hybrid paradigm, has been considered. In this
work, we first analyze the security of some synthetic and (partially) hybrid methods that have been proposed in the last years, and we
conclude that they suffer from a high interval disclosure risk. We then propose the first fully hybrid SDC methods; unfortunately, they
also suffer from a quite high interval disclosure risk. To mitigate this, we propose a postprocessing technique that can be applied to any
data set protected with a synthetic method, with the goal of reducing its interval disclosure risk. We describe through the paper a set of
experiments performed on reference data sets that support our claims

Kdtrees and the real disclosure risks of large statistical databases
http://hdl.handle.net/2117/16561
Kdtrees and the real disclosure risks of large statistical databases
Herranz Sotoca, Javier; Nin Guerrero, Jordi; Solé Simó, Marc
In data privacy, record linkage can be used as an estimator of the disclosure risk of protected data. To
model the worst case scenario one normally attempts to link records from the original data to the protected
data. In this paper we introduce a parametrization of record linkage in terms of a weighted mean
and its weights, and provide a supervised learning method to determine the optimum weights for the
linkage process. That is, the parameters yielding a maximal record linkage between the protected and original
data. We compare our method to standard record linkage with data from several protection methods
widely used in statistical disclosure control, and study the results taking into account the
performance in the linkage process, and its computational effort
Tue, 25 Sep 2012 11:53:08 GMT
http://hdl.handle.net/2117/16561
20120925T11:53:08Z
Herranz Sotoca, Javier
Nin Guerrero, Jordi
Solé Simó, Marc
In data privacy, record linkage can be used as an estimator of the disclosure risk of protected data. To
model the worst case scenario one normally attempts to link records from the original data to the protected
data. In this paper we introduce a parametrization of record linkage in terms of a weighted mean
and its weights, and provide a supervised learning method to determine the optimum weights for the
linkage process. That is, the parameters yielding a maximal record linkage between the protected and original
data. We compare our method to standard record linkage with data from several protection methods
widely used in statistical disclosure control, and study the results taking into account the
performance in the linkage process, and its computational effort

Orders of CM elliptic curves modulo p with at most two primes
http://hdl.handle.net/2117/15793
Orders of CM elliptic curves modulo p with at most two primes
Iwaniec, H.; Jiménez Urroz, Jorge
Nowadays the generation of cryptosystems requires two main aspects. First
the security, and then the size of the keys involved in the construction and
comunication process. About the former one needs a di±cult mathematical
assumption which ensures your system will not be broken unless a well known
di±cult problem is solved. In this context one of the most famous assumption
underlying a wide variety of cryptosystems is the computation of logarithms in
¯nite ¯elds and the Di±e Hellman assumption. However it is also well known
that elliptic curves provide good examples of representation of abelian groups
reducing the size of keys needed to guarantee the same level of security as in
the ¯nite ¯eld case. The ¯rst thing one needs to perform elliptic logarithms
which are computationaly secure is to ¯x a ¯nite ¯eld, Fp, and one curve, E=Fp
de¯ned over the ¯eld, such that jE(Fp)j has a prime factor as large as possible.
In practice the problem of ¯nding such a pair, of curve and ¯eld, seems simple,
just take a curve with integer coe±cients and a prime p of good reduction at
random and see if jE(Fp)j has a big prime factor. However the theory that
makes the previous algorithm useful is by no means obvious, neither clear or
complete. For example it is well known that supersingular elliptic curves have
to be avoided in the previous process since they reduce the security of any
cryptosystem based on the Di±e Hellman assumption on the elliptic logarithm.
But more importantly, the process will be feasible whenever the probability to
¯nd a pair, (E; p), with a big prime factor qj jE(Fp)j is big enough. One problem
arises naturally from the above.
Tue, 08 May 2012 11:42:08 GMT
http://hdl.handle.net/2117/15793
20120508T11:42:08Z
Iwaniec, H.
Jiménez Urroz, Jorge
Nowadays the generation of cryptosystems requires two main aspects. First
the security, and then the size of the keys involved in the construction and
comunication process. About the former one needs a di±cult mathematical
assumption which ensures your system will not be broken unless a well known
di±cult problem is solved. In this context one of the most famous assumption
underlying a wide variety of cryptosystems is the computation of logarithms in
¯nite ¯elds and the Di±e Hellman assumption. However it is also well known
that elliptic curves provide good examples of representation of abelian groups
reducing the size of keys needed to guarantee the same level of security as in
the ¯nite ¯eld case. The ¯rst thing one needs to perform elliptic logarithms
which are computationaly secure is to ¯x a ¯nite ¯eld, Fp, and one curve, E=Fp
de¯ned over the ¯eld, such that jE(Fp)j has a prime factor as large as possible.
In practice the problem of ¯nding such a pair, of curve and ¯eld, seems simple,
just take a curve with integer coe±cients and a prime p of good reduction at
random and see if jE(Fp)j has a big prime factor. However the theory that
makes the previous algorithm useful is by no means obvious, neither clear or
complete. For example it is well known that supersingular elliptic curves have
to be avoided in the previous process since they reduce the security of any
cryptosystem based on the Di±e Hellman assumption on the elliptic logarithm.
But more importantly, the process will be feasible whenever the probability to
¯nd a pair, (E; p), with a big prime factor qj jE(Fp)j is big enough. One problem
arises naturally from the above.

Classifying data from protected statistical datasets
http://hdl.handle.net/2117/14416
Classifying data from protected statistical datasets
Herranz Sotoca, Javier; Matwin, Stan; Nin Guerrero, Jordi; Torra i Reventós, Vicenç
Statistical Disclosure Control (SDC) is an active research area in the recent years. The goal is to transform an original dataset X into a protected one X0, such that X0 does not reveal any relation between confidential and (quasi)identifier attributes and such that X0 can be
used to compute reliable statistical information about X. Many specific protection methods have been proposed and analyzed, with respect to the
levels of privacy and utility that they offer. However, when measuring utility, only differences between the statistical values of X and X0 are considered. This would indicate that datasets protected by SDC methods can be used only for statistical purposes.
We show in this paper that this is not the case, because a protected dataset X0 can be used to construct good classifiers for future data. To do so, we describe an extensive set of experiments that we have run with different SDC protection methods and different (real) datasets. In general, the resulting classifiers are very good, which is good news for both the SDC and the Privacypreserving Data Mining communities. In particular, our results question the necessity of some specific protection methods that have appeared in the
privacypreserving data mining (PPDM) literature with the clear goal of providing good classification.
Thu, 05 Jan 2012 13:01:13 GMT
http://hdl.handle.net/2117/14416
20120105T13:01:13Z
Herranz Sotoca, Javier
Matwin, Stan
Nin Guerrero, Jordi
Torra i Reventós, Vicenç
Statistical Disclosure Control (SDC) is an active research area in the recent years. The goal is to transform an original dataset X into a protected one X0, such that X0 does not reveal any relation between confidential and (quasi)identifier attributes and such that X0 can be
used to compute reliable statistical information about X. Many specific protection methods have been proposed and analyzed, with respect to the
levels of privacy and utility that they offer. However, when measuring utility, only differences between the statistical values of X and X0 are considered. This would indicate that datasets protected by SDC methods can be used only for statistical purposes.
We show in this paper that this is not the case, because a protected dataset X0 can be used to construct good classifiers for future data. To do so, we describe an extensive set of experiments that we have run with different SDC protection methods and different (real) datasets. In general, the resulting classifiers are very good, which is good news for both the SDC and the Privacypreserving Data Mining communities. In particular, our results question the necessity of some specific protection methods that have appeared in the
privacypreserving data mining (PPDM) literature with the clear goal of providing good classification.

On the disclosure risk of multivariate microaggregation
http://hdl.handle.net/2117/12852
On the disclosure risk of multivariate microaggregation
Nin Guerrero, Jordi; Herranz Sotoca, Javier; Torra i Reventós, Vicenç
The aim of data protection methods is to protect a microdata file both minimizing the disclosure risk and preserving the data utility. Microaggregation is one of the most popular such methods among statistical agencies. Record linkage is the standard mechanism used to measure the disclosure risk of a microdata protection method. However, only standard, and quite generic, record linkage methods are usually considered, whereas more specific record linkage techniques can be more appropriate to evaluate the disclosure risk of some protection methods.
In this paper we present a new record linkage technique, specific for microaggregation, which obtains more correct links than standard techniques. We have tested the new technique with MDAV microaggregation and two other microaggregation methods, based on projections, that we propose here for the first time. The direct consequence is that these microaggregation methods have a higher disclosure risk than believed up to now.
Fri, 01 Jul 2011 11:20:50 GMT
http://hdl.handle.net/2117/12852
20110701T11:20:50Z
Nin Guerrero, Jordi
Herranz Sotoca, Javier
Torra i Reventós, Vicenç
The aim of data protection methods is to protect a microdata file both minimizing the disclosure risk and preserving the data utility. Microaggregation is one of the most popular such methods among statistical agencies. Record linkage is the standard mechanism used to measure the disclosure risk of a microdata protection method. However, only standard, and quite generic, record linkage methods are usually considered, whereas more specific record linkage techniques can be more appropriate to evaluate the disclosure risk of some protection methods.
In this paper we present a new record linkage technique, specific for microaggregation, which obtains more correct links than standard techniques. We have tested the new technique with MDAV microaggregation and two other microaggregation methods, based on projections, that we propose here for the first time. The direct consequence is that these microaggregation methods have a higher disclosure risk than believed up to now.

How to group attributes in multivariate microaggregation
http://hdl.handle.net/2117/12851
How to group attributes in multivariate microaggregation
Nin Guerrero, Jordi; Herranz Sotoca, Javier; Torra i Reventós, Vicenç
Microaggregation is one of the most employed microdata protection methods. It builds clusters of at least k original records, and then replaces these records with the centroid
of the cluster. When the number of attributes of the dataset is large, one usually splits the dataset into smaller blocks of attributes, and then applies microaggregation to each block, successively and independently. In this way, the effect of the noise introduced by microaggregation is reduced, at the cost of losing the kanonymity property. In this work we show that, besides the specific microaggregation method, the value of the parameter k and the number of blocks in which the dataset is split, there exists another factor which influences the quality of the microaggregation: the way in which the attributes are grouped to form the blocks. When correlated attributes are grouped
in the same block, the statistical utility of the protected dataset is higher. In contrast, when correlated attributes are dispersed into different blocks, the achieved anonymity is higher, and so, the disclosure risk is lower. We present quantitative evaluations of such statements based on different experiments on real datasets.
Fri, 01 Jul 2011 10:03:47 GMT
http://hdl.handle.net/2117/12851
20110701T10:03:47Z
Nin Guerrero, Jordi
Herranz Sotoca, Javier
Torra i Reventós, Vicenç
Microaggregation is one of the most employed microdata protection methods. It builds clusters of at least k original records, and then replaces these records with the centroid
of the cluster. When the number of attributes of the dataset is large, one usually splits the dataset into smaller blocks of attributes, and then applies microaggregation to each block, successively and independently. In this way, the effect of the noise introduced by microaggregation is reduced, at the cost of losing the kanonymity property. In this work we show that, besides the specific microaggregation method, the value of the parameter k and the number of blocks in which the dataset is split, there exists another factor which influences the quality of the microaggregation: the way in which the attributes are grouped to form the blocks. When correlated attributes are grouped
in the same block, the statistical utility of the protected dataset is higher. In contrast, when correlated attributes are dispersed into different blocks, the achieved anonymity is higher, and so, the disclosure risk is lower. We present quantitative evaluations of such statements based on different experiments on real datasets.