Ponències/Comunicacions de congressos
http://hdl.handle.net/2117/3095
20191122T20:43:31Z

On the power of symmetric linear programs
http://hdl.handle.net/2117/171930
On the power of symmetric linear programs
Atserias, Albert; Dawar, Anuj; Ochremiak, Joanna
We consider families of symmetric linear programs (LPs) that decide a property of graphs (or other relational structures) in the sense that, for each size of graph, there is an LP defining a polyhedral lift that separates the integer points corresponding to graphs with the property from those corresponding to graphs without the property. We show that this is equivalent, with at most polynomial blowup in size, to families of symmetric Boolean circuits with threshold gates. In particular, when we consider polynomialsize LPs, the model is equivalent to definability in a nonuniform version of fixedpoint logic with counting (FPC). Known upper and lower bounds for FPC apply to the nonuniform version. In particular, this implies that the class of graphs with perfect matchings has polynomialsize symmetric LPs while we obtain an exponential lower bound for symmetric LPs for the class of Hamiltonian graphs. We compare and contrast this with previous results (Yannakakis 1991) showing that any symmetric LPs for the matching and TSP polytopes have exponential size. As an application, we establish that for random, uniformly distributed graphs, polynomialsize symmetric LPs are as powerful as general Boolean circuits. We illustrate the effect of this on the wellstudied plantedclique problem.
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
20191107T15:15:49Z
Atserias, Albert
Dawar, Anuj
Ochremiak, Joanna
We consider families of symmetric linear programs (LPs) that decide a property of graphs (or other relational structures) in the sense that, for each size of graph, there is an LP defining a polyhedral lift that separates the integer points corresponding to graphs with the property from those corresponding to graphs without the property. We show that this is equivalent, with at most polynomial blowup in size, to families of symmetric Boolean circuits with threshold gates. In particular, when we consider polynomialsize LPs, the model is equivalent to definability in a nonuniform version of fixedpoint logic with counting (FPC). Known upper and lower bounds for FPC apply to the nonuniform version. In particular, this implies that the class of graphs with perfect matchings has polynomialsize symmetric LPs while we obtain an exponential lower bound for symmetric LPs for the class of Hamiltonian graphs. We compare and contrast this with previous results (Yannakakis 1991) showing that any symmetric LPs for the matching and TSP polytopes have exponential size. As an application, we establish that for random, uniformly distributed graphs, polynomialsize symmetric LPs are as powerful as general Boolean circuits. We illustrate the effect of this on the wellstudied plantedclique problem.

Sizedegree tradeoffs for SumsofSquares and Positivstellensatz proofs
http://hdl.handle.net/2117/169823
Sizedegree tradeoffs for SumsofSquares and Positivstellensatz proofs
Atserias, Albert; Hakoniemi, Tuomas Antero
We show that if a system of degreek polynomial constraints on n Boolean variables has a SumsofSquares (SOS) proof of unsatisfiability with at most s many monomials, then it also has one whose degree is of the order of the square root of n log s plus k. A similar statement holds for the more general Positivstellensatz (PS) proofs. This establishes sizedegree tradeoffs for SOS and PS that match their analogues for weaker proof systems such as Resolution, Polynomial Calculus, and the proof systems for the LP and SDP hierarchies of Lovász and Schrijver. As a corollary to this, and to the known degree lower bounds, we get optimal integrality gaps for exponential size SOS proofs for sparse random instances of the standard NPhard constraint optimization problems. We also get exponential size SOS lower bounds for Tseitin and Knapsack formulas. The proof of our main result relies on a zerogap duality theorem for preordered vector spaces that admit an order unit, whose specialization to PS and SOS may be of independent interest.
20191011T10:36:39Z
Atserias, Albert
Hakoniemi, Tuomas Antero
We show that if a system of degreek polynomial constraints on n Boolean variables has a SumsofSquares (SOS) proof of unsatisfiability with at most s many monomials, then it also has one whose degree is of the order of the square root of n log s plus k. A similar statement holds for the more general Positivstellensatz (PS) proofs. This establishes sizedegree tradeoffs for SOS and PS that match their analogues for weaker proof systems such as Resolution, Polynomial Calculus, and the proof systems for the LP and SDP hierarchies of Lovász and Schrijver. As a corollary to this, and to the known degree lower bounds, we get optimal integrality gaps for exponential size SOS proofs for sparse random instances of the standard NPhard constraint optimization problems. We also get exponential size SOS lower bounds for Tseitin and Knapsack formulas. The proof of our main result relies on a zerogap duality theorem for preordered vector spaces that admit an order unit, whose specialization to PS and SOS may be of independent interest.

Circular (yet sound) proofs
http://hdl.handle.net/2117/169630
Circular (yet sound) proofs
Atserias, Albert; Lauria, Massimo
We introduce a new way of composing proofs in rulebased proof systems that generalizes treelike and daglike proofs. In the new definition, proofs are directed graphs of derived formulas, in which cycles are allowed as long as every formula is derived at least as many times as it is required as a premise. We call such proofs circular. We show that, for all sets of standard inference rules, circular proofs are sound. We first focus on the circular version of Resolution, and see that it is stronger than Resolution since, as we show, the pigeonhole principle has circular Resolution proofs of polynomial size. Surprisingly, as proof systems for deriving clauses from clauses, Circular Resolution turns out to be equivalent to SheraliAdams, a proof system for reasoning through polynomial inequalities that has linear programming at its base. As corollaries we get: (1) polynomialtime (LPbased) algorithms that find circular Resolution proofs of constant width, (2) examples that separate circular from daglike Resolution, such as the pigeonhole principle and its variants, and (3) exponentially hard cases for circular Resolution. Contrary to the case of circular resolution, for Frege we show that circular proofs can be converted into treelike ones with at most polynomial overhead.
20191010T09:31:47Z
Atserias, Albert
Lauria, Massimo
We introduce a new way of composing proofs in rulebased proof systems that generalizes treelike and daglike proofs. In the new definition, proofs are directed graphs of derived formulas, in which cycles are allowed as long as every formula is derived at least as many times as it is required as a premise. We call such proofs circular. We show that, for all sets of standard inference rules, circular proofs are sound. We first focus on the circular version of Resolution, and see that it is stronger than Resolution since, as we show, the pigeonhole principle has circular Resolution proofs of polynomial size. Surprisingly, as proof systems for deriving clauses from clauses, Circular Resolution turns out to be equivalent to SheraliAdams, a proof system for reasoning through polynomial inequalities that has linear programming at its base. As corollaries we get: (1) polynomialtime (LPbased) algorithms that find circular Resolution proofs of constant width, (2) examples that separate circular from daglike Resolution, such as the pigeonhole principle and its variants, and (3) exponentially hard cases for circular Resolution. Contrary to the case of circular resolution, for Frege we show that circular proofs can be converted into treelike ones with at most polynomial overhead.

RTLaware dataflowdriven macro placement
http://hdl.handle.net/2117/169341
RTLaware dataflowdriven macro placement
Vidal Obiols, Alexandre; Cortadella, Jordi; Petit Silvestre, Jordi; Galcerán Oms, Marc; Martorell Cid, Ferran
When RTL designers define the hierarchy of a system, they exploit their knowledge about the conceptual abstractions devised during the design and the functional interactions between the logical components. This valuable information is often lost during physical synthesis. This paper proposes a novel multilevel approach for the macro placement problem of complex designs dominated by macro blocks, typically memories. By taking advantage of the hierarchy tree, the netlist is divided into blocks containing macros and standard cells, and their dataflow affinity is inferred considering the latency and flow width of their interaction. The layout is represented using slicing structures and generated with a topdown algorithm capable of handling blocks with both hard and soft components, aimed at wirelength minimization. These techniques have been applied to a set of large industrial circuits and compared against both a commercial floorplanner and handcrafted floorplans by expert backend engineers. The proposed approach outperforms the commercial tool and produces solutions with similar quality to the best handcrafted floorplans. Therefore, the generated floorplans provide an excellent starting point for the physical design iterations and contribute to reduce turnaround time significantly.
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
20191008T07:23:04Z
Vidal Obiols, Alexandre
Cortadella, Jordi
Petit Silvestre, Jordi
Galcerán Oms, Marc
Martorell Cid, Ferran
When RTL designers define the hierarchy of a system, they exploit their knowledge about the conceptual abstractions devised during the design and the functional interactions between the logical components. This valuable information is often lost during physical synthesis. This paper proposes a novel multilevel approach for the macro placement problem of complex designs dominated by macro blocks, typically memories. By taking advantage of the hierarchy tree, the netlist is divided into blocks containing macros and standard cells, and their dataflow affinity is inferred considering the latency and flow width of their interaction. The layout is represented using slicing structures and generated with a topdown algorithm capable of handling blocks with both hard and soft components, aimed at wirelength minimization. These techniques have been applied to a set of large industrial circuits and compared against both a commercial floorplanner and handcrafted floorplans by expert backend engineers. The proposed approach outperforms the commercial tool and produces solutions with similar quality to the best handcrafted floorplans. Therefore, the generated floorplans provide an excellent starting point for the physical design iterations and contribute to reduce turnaround time significantly.

The dynamic pipeline paradigm
http://hdl.handle.net/2117/168520
The dynamic pipeline paradigm
Zoltán, Cristina; Pasarella Sánchez, Ana Edelmira; Aráoz Durand, Julián Arturo; Vidal, MariaEsther
Nowadays, in the era of Big Data and Internet of Things, large volumes of data in motion are produced in heterogeneous formats, frequencies, densities, and quantities. In general, data is continuously produced by diverse devices and most of them must be processed at realtime. Indeed, this change of paradigm in the way in which data are produced, forces us to rethink the way in which they should be processed even in presence of parallel approaches. To process continuous data, datadriven frameworks are demanded; they are required to dynamically adapt execution schedulers, reconfigure computational structures, and adjust the use of resources according to the characteristics of the input data stream. In previous work, we introduced the Dynamic Pipeline as one of these computational structures, and we experimentally showed its efficiency when it is used to solve the problem of counting triangles in a graph. In this work, our aim is to define the main components of the Dynamic Pipeline which is suitable to specify solutions to problems whose incoming data is heterogeneous data in motion. To be concrete, we define the Dynamic Pipeline Paradigm and, additionally, we show the applicability of our framework to specify different wellknown problems.
20190920T11:49:30Z
Zoltán, Cristina
Pasarella Sánchez, Ana Edelmira
Aráoz Durand, Julián Arturo
Vidal, MariaEsther
Nowadays, in the era of Big Data and Internet of Things, large volumes of data in motion are produced in heterogeneous formats, frequencies, densities, and quantities. In general, data is continuously produced by diverse devices and most of them must be processed at realtime. Indeed, this change of paradigm in the way in which data are produced, forces us to rethink the way in which they should be processed even in presence of parallel approaches. To process continuous data, datadriven frameworks are demanded; they are required to dynamically adapt execution schedulers, reconfigure computational structures, and adjust the use of resources according to the characteristics of the input data stream. In previous work, we introduced the Dynamic Pipeline as one of these computational structures, and we experimentally showed its efficiency when it is used to solve the problem of counting triangles in a graph. In this work, our aim is to define the main components of the Dynamic Pipeline which is suitable to specify solutions to problems whose incoming data is heterogeneous data in motion. To be concrete, we define the Dynamic Pipeline Paradigm and, additionally, we show the applicability of our framework to specify different wellknown problems.

Exact and heuristic allocation of multikernel applications to multiFPGA platforms
http://hdl.handle.net/2117/166417
Exact and heuristic allocation of multikernel applications to multiFPGA platforms
Shan, Junnan; Casu, Mario R.; Cortadella, Jordi; Lavagno, Luciano; Lazarescu, Mihai T.
FPGAbased accelerators demonstrated high energy efficiency compared to GPUs and CPUs. However, single FPGA designs may not achieve sufficient task parallelism. In this work, we optimize the mapping of highperformance multikernel applications, like Convolutional Neural Networks, to multiFPGA platforms. First, we formulate the system level optimization problem, choosing within a huge design space the parallelism and number of compute units for each kernel in the pipeline. Then we solve it using a combination of Geometric Programming, producing the optimum performance solution given resource and DRAM bandwidth constraints, and a heuristic allocator of the compute units on the FPGA cluster.
20190718T10:24:50Z
Shan, Junnan
Casu, Mario R.
Cortadella, Jordi
Lavagno, Luciano
Lazarescu, Mihai T.
FPGAbased accelerators demonstrated high energy efficiency compared to GPUs and CPUs. However, single FPGA designs may not achieve sufficient task parallelism. In this work, we optimize the mapping of highperformance multikernel applications, like Convolutional Neural Networks, to multiFPGA platforms. First, we formulate the system level optimization problem, choosing within a huge design space the parallelism and number of compute units for each kernel in the pipeline. Then we solve it using a combination of Geometric Programming, producing the optimum performance solution given resource and DRAM bandwidth constraints, and a heuristic allocator of the compute units on the FPGA cluster.

Sesquickselect: One and a half pivots for cacheefficient selection
http://hdl.handle.net/2117/143202
Sesquickselect: One and a half pivots for cacheefficient selection
Martínez Parra, Conrado; Nebel, Markus; Wild, Sebastian
Because of unmatched improvements in CPU performance, memory transfers have become a bottleneck of program execution. As discovered in recent years, this also affects sorting in internal memory. Since partitioning around several pivots reduces overall memory transfers, we have seen renewed interest in multiway Quicksort. Here, we analyze in how far multiway partitioning helps in Quickselect. We compute the expected number of comparisons and scanned elements (approximating memory transfers) for a generic class of (nonadaptive) multiway Quickselect and show that three or more pivots are not helpful, but two pivots are. Moreover, we consider “adaptive” variants which choose partitioning and pivotselection methods in each recursive step from a finite set of alternatives depending on the current (relative) sought rank. We show that “Sesquickselect”, a new Quickselect variant that uses either one or two pivots, makes better use of small samples w.r.t. memory transfers than other Quickselect variants. Copyright© (2019) by SIAM: Society for Industrial and Applied Mathematics.
20190627T07:32:22Z
Martínez Parra, Conrado
Nebel, Markus
Wild, Sebastian
Because of unmatched improvements in CPU performance, memory transfers have become a bottleneck of program execution. As discovered in recent years, this also affects sorting in internal memory. Since partitioning around several pivots reduces overall memory transfers, we have seen renewed interest in multiway Quicksort. Here, we analyze in how far multiway partitioning helps in Quickselect. We compute the expected number of comparisons and scanned elements (approximating memory transfers) for a generic class of (nonadaptive) multiway Quickselect and show that three or more pivots are not helpful, but two pivots are. Moreover, we consider “adaptive” variants which choose partitioning and pivotselection methods in each recursive step from a finite set of alternatives depending on the current (relative) sought rank. We show that “Sesquickselect”, a new Quickselect variant that uses either one or two pivots, makes better use of small samples w.r.t. memory transfers than other Quickselect variants. Copyright© (2019) by SIAM: Society for Industrial and Applied Mathematics.

Genet: a tool for the synthesis and mining of Petri nets
http://hdl.handle.net/2117/133838
Genet: a tool for the synthesis and mining of Petri nets
Carmona Vargas, Josep; Cortadella, Jordi; Kishinevsky, Michael
Statebased representations of concurrent systems suffer from the well known state explosion problem. In contrast, Petri nets are good models for this type of systems both in terms of complexity of the analysis and in visualization of the model. In this paper we present Genet, a tool that allows the derivation of a general Petri net from a statebased representation of a system. The tool supports two modes of operation: synthesis and mining. Applications of these two modes range from synthesis of digital systems to business intelligence.
20190603T09:56:02Z
Carmona Vargas, Josep
Cortadella, Jordi
Kishinevsky, Michael
Statebased representations of concurrent systems suffer from the well known state explosion problem. In contrast, Petri nets are good models for this type of systems both in terms of complexity of the analysis and in visualization of the model. In this paper we present Genet, a tool that allows the derivation of a general Petri net from a statebased representation of a system. The tool supports two modes of operation: synthesis and mining. Applications of these two modes range from synthesis of digital systems to business intelligence.

Lazy transition systems: application to timing optimization of asynchronous circuits
http://hdl.handle.net/2117/133832
Lazy transition systems: application to timing optimization of asynchronous circuits
Cortadella, Jordi; Kishinevsky, Michael; Kondratyev, Alex; Lavagno, Luciano; Taubin, Alexander; Yakovlev, Alex
The paper introduces Lazy Transitions Systems (LzTSs). The notion of laziness explicitly distinguishes between the enabling and the firing of an event in a transition system. LzTSs can be effectively used to model the behavior of asynchronous circuits in which relative timing assumptions can be made on the occurrence of events. These assumptions can be derived from the information known a priori about the delay of the environment and the timing characteristics of the gates that will implement the circuit. The paper presents necessary conditions to synthesize circuits with a correct behavior under the given timing assumptions. Preliminary results show that significant area and performance improvements can be obtained by exploiting the extra "don't care" space implicitly provided by the laziness of the events.
20190603T09:05:16Z
Cortadella, Jordi
Kishinevsky, Michael
Kondratyev, Alex
Lavagno, Luciano
Taubin, Alexander
Yakovlev, Alex
The paper introduces Lazy Transitions Systems (LzTSs). The notion of laziness explicitly distinguishes between the enabling and the firing of an event in a transition system. LzTSs can be effectively used to model the behavior of asynchronous circuits in which relative timing assumptions can be made on the occurrence of events. These assumptions can be derived from the information known a priori about the delay of the environment and the timing characteristics of the gates that will implement the circuit. The paper presents necessary conditions to synthesize circuits with a correct behavior under the given timing assumptions. Preliminary results show that significant area and performance improvements can be obtained by exploiting the extra "don't care" space implicitly provided by the laziness of the events.

Petrify: a tool for manipulating concurrent specifications and synthesis of asynchronous controllers
http://hdl.handle.net/2117/133830
Petrify: a tool for manipulating concurrent specifications and synthesis of asynchronous controllers
Cortadella, Jordi; Kishinevsky, Michael; Kondratyev, Alex; Lavagno, Luciano; Yakovlev, Alex
Petrifyis a tool for (1) manipulating concurrent specifications and (2) synthesis and optimization of asynchronous control circuits. Given a Petri Net (PN), a Signal Transition Graph (STG), or a Transition System (TS)1it (1)generates another PN or STG which is simpler than the original description and (2) produces an optimized netlistof an asynchronous controller in the target gate library while preserving the specified inputoutput behavior. Given a specification petrify provides a designer with a netlist of anasynchronous circuit and a PNlike description of the circuit behavior in terms of events and ordering relations between events. The latter ability of backannotating to the specification level helps the designer to control the design process. For transforming a specification petrify performs a token flow analysis of the initial PN and produces a transition system (TS). In the initial TS, all transitions with the same label are considered as one event. The TS is then transformed and transitions relabeled to fulfill the conditions required to obtain a safe irredundant PN. For synthesis of an asynchronous implementation petrify performs state assignment by solving the Complete State Coding problem. State assignment is coupled with logic minimization and speedindependent technology mapping to a target library. The final netlist is guaranteed to be speedindependent, i.e., hazardfree under any distribution of gate delays and multiple input changes satisfying the initial specification. The tool has been used for synthesis of PNs and PNs composition [10], synthesis [7, 8, 9] andresynthesis [29] of asynchronous controllers and can be also applied in areas related with the analysis of concurrent programs. This paper provides an overview of petrify and the theory behind its main functions.
20190603T08:44:03Z
Cortadella, Jordi
Kishinevsky, Michael
Kondratyev, Alex
Lavagno, Luciano
Yakovlev, Alex
Petrifyis a tool for (1) manipulating concurrent specifications and (2) synthesis and optimization of asynchronous control circuits. Given a Petri Net (PN), a Signal Transition Graph (STG), or a Transition System (TS)1it (1)generates another PN or STG which is simpler than the original description and (2) produces an optimized netlistof an asynchronous controller in the target gate library while preserving the specified inputoutput behavior. Given a specification petrify provides a designer with a netlist of anasynchronous circuit and a PNlike description of the circuit behavior in terms of events and ordering relations between events. The latter ability of backannotating to the specification level helps the designer to control the design process. For transforming a specification petrify performs a token flow analysis of the initial PN and produces a transition system (TS). In the initial TS, all transitions with the same label are considered as one event. The TS is then transformed and transitions relabeled to fulfill the conditions required to obtain a safe irredundant PN. For synthesis of an asynchronous implementation petrify performs state assignment by solving the Complete State Coding problem. State assignment is coupled with logic minimization and speedindependent technology mapping to a target library. The final netlist is guaranteed to be speedindependent, i.e., hazardfree under any distribution of gate delays and multiple input changes satisfying the initial specification. The tool has been used for synthesis of PNs and PNs composition [10], synthesis [7, 8, 9] andresynthesis [29] of asynchronous controllers and can be also applied in areas related with the analysis of concurrent programs. This paper provides an overview of petrify and the theory behind its main functions.