2018-04-21T15:45:42Z
A linear optimization based method for data privacy in statistical tabular data
Castro Pérez, Jordi; González Alastrué, José Antonio
National Statistical Agencies routinely disseminate large amounts of data. Prior to dissemination these data have to be protected to avoid releasing confidential information. Controlled tabular adjustment (CTA) is one of the available methods for this purpose. CTA formulates an optimization problem that looks for the safe table which is closest to the original one. The standard CTA approach results in a mixed integer linear optimization (MILO) problem, which is very challenging for current
technology. In this work we present a much less costly variant of CTA that formulates a multiobjective linear optimization (LO) problem, where binary variables are pre-fixed, and the resulting continuous problem is solved by lexicographic optimization. Extensive computational results are reported using both commercial (CPLEX and XPRESS) and open source (Clp) solvers, with either simplex or interior-point methods, on a set of real instances. Most instances were successfully solved with
the LO-CTA variant in less than one hour, while many of them are computationally very expensive with the MILO-CTA formulation. The interior-point method outperformed simplex in this particular application.
Stabilized Benders methods for large-scale combinatorial optimization, with appllication to data privacy
Baena, Daniel; Castro Pérez, Jordi; Frangioni, Antonio
The Cell Suppression Problem (CSP) is a challenging Mixed-Integer Linear Problem arising in statistical tabular data protection. Medium sized instances of CSP involve thousands of binary variables and million of continuous variables and constraints. However, CSP has the typical
structure that allows application of the renowned Benders’ decomposition method: once the “complicating” binary variables are fixed, the problem decomposes into a large set of linear subproblems on the “easy” continuous ones. This allows to project away the easy variables, reducing to a master problem in the complicating ones where the value functions of the subproblems are approximated with the standard cutting-plane approach. Hence, Benders’ decomposition suffers from the same drawbacks of the cutting-plane method, i.e., oscillation and slow convergence, compounded with the fact that the master problem is combinatorial. To overcome this drawback we present a stabilized Benders decomposition whose master is restricted to a neighborhood of successful candidates by local branching constraints, which are dynamically adjusted, and even dropped, during the iterations. Our experiments with randomly generated and real-world CSP instances with up to 3600 binary variables, 90M continuous variables and 15M inequality constraints show that our approach is competitive with both the current state-of-the-art (cutting-plane-based) code for cell suppression, and the Benders implementation in CPLEX 12.7. In some instances, stabilized Benders is able to quickly provide a very good solution in less than one minute, while the other approaches were not able to find any feasible solution in one hour.
Virtual mobility lab: a systemic approach to urban mobility challenges
Barceló Bugeda, Jaume; Montero Mercadé, Lídia; Ros Roca, Xavier
Fundamentos teóricos del análisis de correspondencias
Martí Recober, Manuel; Aluja Banet, Tomàs; Bécue Bertaut, Mónica María
2017-11-23T09:03:26ZMartí Recober, ManuelAluja Banet, TomàsBécue Bertaut, Mónica MaríaAnálisis de correspondencias múltiples sobre un grafo
Análisis de correspondencias múltiples sobre un grafo
Aluja Banet, Tomàs; Martí Recober, Manuel
En anàlisi de dades sovint hom analitza matrius de dades formades per variables nominals, correlacionades amb unes altres anomenades variables
Local and partial correspondence analysis application to the analysis of electoral data
Aluja Banet, Tomàs
In data analysis we must often analyze data sets whose observations are related by a graph structure. This is the case for electoral data, where the electoral units correspond to a definite geographical areas. In this case can be interesting to analyze the same phenomenon fixing some a priori relation.
First part we are going to present the rationale of these methods. The local analysis aims to eleiminate the effect of geographical position of individuals, represented by a contiguity graph, in an exploratory factorial analysis of spatial data. It will be proved interesting to analyze the electoral results keeping the socio-economic position constant, by means of a similarity graph. This is called partial analysis beacuse is based on the same idea of instrumental variables of Rao and partial correlation analysis.
In the second part of the article, this methodology is applied to the data matrix formed by 1059 electoral units, called sections, giving the electoral results in the last autonomous election of 1984 in Barcelona. Moreover, it will be interesting to define regions of units with homogeneous electoral behaviour, obtained by an algorithm of clustering with contiguity constraint.
Complementary remarks and improvements to a lagrangean heuristic for capacitated plant location problems
Barceló Bugeda, Jaime; Casanovas Garcia, Josep
In a former paper, [1], a heuristic using multipliers from a langrean relaxation was proposed for getting feasible solutions to a class of pure integer capacited plant location problems. The heuristic consisted of three steps, the last one being a plant interchange step. Further computational experience has shown that the proposed interchange procedure could fail. In this paper we investigate the computational behaviour of the heuristic without interchange procedure, and we give the result of our computational experience.
Clinical trial designs using CompARE. An on-line exploratory tool for investigators
Gómez Mateu, Moisés; Gómez Melis, Guadalupe
Conclusions from randomized clinical trials (RCT) rely primarily
on the primary endpoint (PE) chosen at the design stage of the study. There should generally be only one PE which should be able to provide the most clinically relevant and scientific evidence regarding the potential eficacy of the new treatment.
Therefore, it is of utmost importance to select it appropriately.
Composite endpoints, consisting of the union of several endpoints, are often used as PE in RCT. Gomez and Lagakos (2013) develop a statistical methodology to evaluate the convenience of using a CE as opposed to one of its components.
Their strategy is based on the asymptotic relative eficiency (ARE), relating the efi is based on the asymptotic relative eficiency (ARE), relating the eciency of using the logrank test based on the CE versus the eficiency based on one of its components. This paper introduces the freeware online platform CompARE that facilitates the study of the performance of different candidate endpoints which could be used as PE at the design stage of a trial. CompARE, through an intuitive
interface, implements the novel ARE method.
Report de Recerca aprovat per la Comissió de doctorat i de recerca del Departament d'EIO
Indústria 4.0 / Status Report Marc de referència sobre la Indústria 4.0 octubre 2016
Fonseca Casas, Pau
L¿objecte d¿aquest document és donar a conèixer els elements de la Indústria 4.0 als enginyers, al teixit industrial català i a la societat, podent ser utilitzat com a instrument que faciliti el debat i la construcció d'un discurs normalitzat al voltant de la mateixa. Existeix el debat sobre fins a quin punt el màrqueting de la Indústria 4.0 va per davant de la realitat o a l¿inrevés. En qualsevol cas, l¿objectiu de la Comissió i4.0 d¿Enginyers de Catalunya és contribuir a l¿establiment de bases sòlides i a la formalització del cos de coneixent de la Indústria 4.0.
Generación automàtica de reglas difusas en dominios poco estructurados con variables numéricas
Vazquez, Fernando; Gibert, Karina
In this report, an application of a methodology of automatic
generation of conceptual descriptions for characterizing a given partition in an ill-structured domain is presented. A specific application on a wastewater treatment process (wwtp) illustrates the behaviour of this methodology. The methodology is based on the combination of statistical tools and inductive
learning, in such a way that the nature of the data is preserved, avoiding
previous transformations of the variables. Thus qualitative and
quantitative information can be induced from data. This information is
useful for the automatic generation of a system of fuzzy rules, which, in
turn, allows the posterior recognition of the obtained classes.
In previous works it has been proved that the multiple box-plot is a useful
and powerful statistical tool for distinguishing classes by means of
numerical variables. It constitutes the basis for the methodology presented
here, which permits detection of relevant variables characterized of any
classes.
In this report, we propose the first version of a formal methodology having
as an objective the automatic generation of conceptual class descriptions.
The goal is to characterize the various situations that can arise in a day
at a wastewater treatment plant (relevant information to facilitate the
plant's managing the decision making processes).
