Ponències/Comunicacions de congressos
http://hdl.handle.net/2117/3095
2019-10-17T07:42:46Z
2019-10-17T07:42:46Z
Size-degree trade-offs for Sums-of-Squares and Positivstellensatz proofs
Atserias, Albert
Hakoniemi, Tuomas Antero
http://hdl.handle.net/2117/169823
2019-10-16T03:38:35Z
2019-10-11T10:36:39Z
Size-degree trade-offs for Sums-of-Squares and Positivstellensatz proofs
Atserias, Albert; Hakoniemi, Tuomas Antero
We show that if a system of degree-k polynomial constraints on n Boolean variables has a Sums-of-Squares (SOS) proof of unsatisfiability with at most s many monomials, then it also has one whose degree is of the order of the square root of n log s plus k. A similar statement holds for the more general Positivstellensatz (PS) proofs. This establishes size-degree trade-offs for SOS and PS that match their analogues for weaker proof systems such as Resolution, Polynomial Calculus, and the proof systems for the LP and SDP hierarchies of Lovász and Schrijver. As a corollary to this, and to the known degree lower bounds, we get optimal integrality gaps for exponential size SOS proofs for sparse random instances of the standard NP-hard constraint optimization problems. We also get exponential size SOS lower bounds for Tseitin and Knapsack formulas. The proof of our main result relies on a zero-gap duality theorem for pre-ordered vector spaces that admit an order unit, whose specialization to PS and SOS may be of independent interest.
2019-10-11T10:36:39Z
Atserias, Albert
Hakoniemi, Tuomas Antero
We show that if a system of degree-k polynomial constraints on n Boolean variables has a Sums-of-Squares (SOS) proof of unsatisfiability with at most s many monomials, then it also has one whose degree is of the order of the square root of n log s plus k. A similar statement holds for the more general Positivstellensatz (PS) proofs. This establishes size-degree trade-offs for SOS and PS that match their analogues for weaker proof systems such as Resolution, Polynomial Calculus, and the proof systems for the LP and SDP hierarchies of Lovász and Schrijver. As a corollary to this, and to the known degree lower bounds, we get optimal integrality gaps for exponential size SOS proofs for sparse random instances of the standard NP-hard constraint optimization problems. We also get exponential size SOS lower bounds for Tseitin and Knapsack formulas. The proof of our main result relies on a zero-gap duality theorem for pre-ordered vector spaces that admit an order unit, whose specialization to PS and SOS may be of independent interest.
Circular (yet sound) proofs
Atserias, Albert
Lauria, Massimo
http://hdl.handle.net/2117/169630
2019-10-11T05:24:25Z
2019-10-10T09:31:47Z
Circular (yet sound) proofs
Atserias, Albert; Lauria, Massimo
We introduce a new way of composing proofs in rule-based proof systems that generalizes tree-like and dag-like proofs. In the new definition, proofs are directed graphs of derived formulas, in which cycles are allowed as long as every formula is derived at least as many times as it is required as a premise. We call such proofs circular. We show that, for all sets of standard inference rules, circular proofs are sound. We first focus on the circular version of Resolution, and see that it is stronger than Resolution since, as we show, the pigeonhole principle has circular Resolution proofs of polynomial size. Surprisingly, as proof systems for deriving clauses from clauses, Circular Resolution turns out to be equivalent to Sherali-Adams, a proof system for reasoning through polynomial inequalities that has linear programming at its base. As corollaries we get: (1) polynomial-time (LP-based) algorithms that find circular Resolution proofs of constant width, (2) examples that separate circular from dag-like Resolution, such as the pigeonhole principle and its variants, and (3) exponentially hard cases for circular Resolution. Contrary to the case of circular resolution, for Frege we show that circular proofs can be converted into tree-like ones with at most polynomial overhead.
2019-10-10T09:31:47Z
Atserias, Albert
Lauria, Massimo
We introduce a new way of composing proofs in rule-based proof systems that generalizes tree-like and dag-like proofs. In the new definition, proofs are directed graphs of derived formulas, in which cycles are allowed as long as every formula is derived at least as many times as it is required as a premise. We call such proofs circular. We show that, for all sets of standard inference rules, circular proofs are sound. We first focus on the circular version of Resolution, and see that it is stronger than Resolution since, as we show, the pigeonhole principle has circular Resolution proofs of polynomial size. Surprisingly, as proof systems for deriving clauses from clauses, Circular Resolution turns out to be equivalent to Sherali-Adams, a proof system for reasoning through polynomial inequalities that has linear programming at its base. As corollaries we get: (1) polynomial-time (LP-based) algorithms that find circular Resolution proofs of constant width, (2) examples that separate circular from dag-like Resolution, such as the pigeonhole principle and its variants, and (3) exponentially hard cases for circular Resolution. Contrary to the case of circular resolution, for Frege we show that circular proofs can be converted into tree-like ones with at most polynomial overhead.
RTL-aware dataflow-driven macro placement
Vidal Obiols, Alexandre
Cortadella, Jordi
Petit Silvestre, Jordi
Galcerán Oms, Marc
Martorell Cid, Ferran
http://hdl.handle.net/2117/169341
2019-10-09T05:31:51Z
2019-10-08T07:23:04Z
RTL-aware dataflow-driven macro placement
Vidal Obiols, Alexandre; Cortadella, Jordi; Petit Silvestre, Jordi; Galcerán Oms, Marc; Martorell Cid, Ferran
When RTL designers define the hierarchy of a system, they exploit their knowledge about the conceptual abstractions devised during the design and the functional interactions between the logical components. This valuable information is often lost during physical synthesis. This paper proposes a novel multi-level approach for the macro placement problem of complex designs dominated by macro blocks, typically memories. By taking advantage of the hierarchy tree, the netlist is divided into blocks containing macros and standard cells, and their dataflow affinity is inferred considering the latency and flow width of their interaction. The layout is represented using slicing structures and generated with a top-down algorithm capable of handling blocks with both hard and soft components, aimed at wirelength minimization. These techniques have been applied to a set of large industrial circuits and compared against both a commercial floorplanner and handcrafted floorplans by expert back-end engineers. The proposed approach outperforms the commercial tool and produces solutions with similar quality to the best handcrafted floorplans. Therefore, the generated floorplans provide an excellent starting point for the physical design iterations and contribute to reduce turn-around time significantly.
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
2019-10-08T07:23:04Z
Vidal Obiols, Alexandre
Cortadella, Jordi
Petit Silvestre, Jordi
Galcerán Oms, Marc
Martorell Cid, Ferran
When RTL designers define the hierarchy of a system, they exploit their knowledge about the conceptual abstractions devised during the design and the functional interactions between the logical components. This valuable information is often lost during physical synthesis. This paper proposes a novel multi-level approach for the macro placement problem of complex designs dominated by macro blocks, typically memories. By taking advantage of the hierarchy tree, the netlist is divided into blocks containing macros and standard cells, and their dataflow affinity is inferred considering the latency and flow width of their interaction. The layout is represented using slicing structures and generated with a top-down algorithm capable of handling blocks with both hard and soft components, aimed at wirelength minimization. These techniques have been applied to a set of large industrial circuits and compared against both a commercial floorplanner and handcrafted floorplans by expert back-end engineers. The proposed approach outperforms the commercial tool and produces solutions with similar quality to the best handcrafted floorplans. Therefore, the generated floorplans provide an excellent starting point for the physical design iterations and contribute to reduce turn-around time significantly.
The dynamic pipeline paradigm
Zoltán, Cristina
Pasarella Sánchez, Ana Edelmira
Aráoz Durand, Julián Arturo
Vidal, Maria-Esther
http://hdl.handle.net/2117/168520
2019-09-21T05:20:39Z
2019-09-20T11:49:30Z
The dynamic pipeline paradigm
Zoltán, Cristina; Pasarella Sánchez, Ana Edelmira; Aráoz Durand, Julián Arturo; Vidal, Maria-Esther
Nowadays, in the era of Big Data and Internet of Things, large volumes of data in motion are produced in heterogeneous formats, frequencies, densities, and quantities. In general, data is continuously produced by diverse devices and most of them must be processed at real-time. Indeed, this change of paradigm in the way in which data are produced, forces us to rethink the way in which they should be processed even in presence of parallel approaches. To process continuous data, data-driven frameworks are demanded; they are required to dynamically adapt execution schedulers, reconfigure computational structures, and adjust the use of resources according to the characteristics of the input data stream. In previous work, we introduced the Dynamic Pipeline as one of these computational structures, and we experimentally showed its efficiency when it is used to solve the problem of counting triangles in a graph. In this work, our aim is to define the main components of the Dynamic Pipeline which is suitable to specify solutions to problems whose incoming data is heterogeneous data in motion. To be concrete, we define the Dynamic Pipeline Paradigm and, additionally, we show the applicability of our framework to specify different well-known problems.
2019-09-20T11:49:30Z
Zoltán, Cristina
Pasarella Sánchez, Ana Edelmira
Aráoz Durand, Julián Arturo
Vidal, Maria-Esther
Nowadays, in the era of Big Data and Internet of Things, large volumes of data in motion are produced in heterogeneous formats, frequencies, densities, and quantities. In general, data is continuously produced by diverse devices and most of them must be processed at real-time. Indeed, this change of paradigm in the way in which data are produced, forces us to rethink the way in which they should be processed even in presence of parallel approaches. To process continuous data, data-driven frameworks are demanded; they are required to dynamically adapt execution schedulers, reconfigure computational structures, and adjust the use of resources according to the characteristics of the input data stream. In previous work, we introduced the Dynamic Pipeline as one of these computational structures, and we experimentally showed its efficiency when it is used to solve the problem of counting triangles in a graph. In this work, our aim is to define the main components of the Dynamic Pipeline which is suitable to specify solutions to problems whose incoming data is heterogeneous data in motion. To be concrete, we define the Dynamic Pipeline Paradigm and, additionally, we show the applicability of our framework to specify different well-known problems.
Exact and heuristic allocation of multi-kernel applications to multi-FPGA platforms
Shan, Junnan
Casu, Mario R.
Cortadella, Jordi
Lavagno, Luciano
Lazarescu, Mihai T.
http://hdl.handle.net/2117/166417
2019-07-19T05:12:22Z
2019-07-18T10:24:50Z
Exact and heuristic allocation of multi-kernel applications to multi-FPGA platforms
Shan, Junnan; Casu, Mario R.; Cortadella, Jordi; Lavagno, Luciano; Lazarescu, Mihai T.
FPGA-based accelerators demonstrated high energy efficiency compared to GPUs and CPUs. However, single FPGA designs may not achieve sufficient task parallelism. In this work, we optimize the mapping of high-performance multi-kernel applications, like Convolutional Neural Networks, to multi-FPGA platforms. First, we formulate the system level optimization problem, choosing within a huge design space the parallelism and number of compute units for each kernel in the pipeline. Then we solve it using a combination of Geometric Programming, producing the optimum performance solution given resource and DRAM bandwidth constraints, and a heuristic allocator of the compute units on the FPGA cluster.
2019-07-18T10:24:50Z
Shan, Junnan
Casu, Mario R.
Cortadella, Jordi
Lavagno, Luciano
Lazarescu, Mihai T.
FPGA-based accelerators demonstrated high energy efficiency compared to GPUs and CPUs. However, single FPGA designs may not achieve sufficient task parallelism. In this work, we optimize the mapping of high-performance multi-kernel applications, like Convolutional Neural Networks, to multi-FPGA platforms. First, we formulate the system level optimization problem, choosing within a huge design space the parallelism and number of compute units for each kernel in the pipeline. Then we solve it using a combination of Geometric Programming, producing the optimum performance solution given resource and DRAM bandwidth constraints, and a heuristic allocator of the compute units on the FPGA cluster.
Sesquickselect: One and a half pivots for cache-efficient selection
Martínez Parra, Conrado
Nebel, Markus
Wild, Sebastian
http://hdl.handle.net/2117/143202
2019-06-28T05:19:07Z
2019-06-27T07:32:22Z
Sesquickselect: One and a half pivots for cache-efficient selection
Martínez Parra, Conrado; Nebel, Markus; Wild, Sebastian
Because of unmatched improvements in CPU performance, memory transfers have become a bottleneck of program execution. As discovered in recent years, this also affects sorting in internal memory. Since partitioning around several pivots reduces overall memory transfers, we have seen renewed interest in multiway Quicksort. Here, we analyze in how far multiway partitioning helps in Quickselect. We compute the expected number of comparisons and scanned elements (approximating memory transfers) for a generic class of (non-adaptive) multiway Quickselect and show that three or more pivots are not helpful, but two pivots are. Moreover, we consider “adaptive” variants which choose partitioning and pivot-selection methods in each recursive step from a finite set of alternatives depending on the current (relative) sought rank. We show that “Sesquickselect”, a new Quickselect variant that uses either one or two pivots, makes better use of small samples w.r.t. memory transfers than other Quickselect variants. Copyright© (2019) by SIAM: Society for Industrial and Applied Mathematics.
2019-06-27T07:32:22Z
Martínez Parra, Conrado
Nebel, Markus
Wild, Sebastian
Because of unmatched improvements in CPU performance, memory transfers have become a bottleneck of program execution. As discovered in recent years, this also affects sorting in internal memory. Since partitioning around several pivots reduces overall memory transfers, we have seen renewed interest in multiway Quicksort. Here, we analyze in how far multiway partitioning helps in Quickselect. We compute the expected number of comparisons and scanned elements (approximating memory transfers) for a generic class of (non-adaptive) multiway Quickselect and show that three or more pivots are not helpful, but two pivots are. Moreover, we consider “adaptive” variants which choose partitioning and pivot-selection methods in each recursive step from a finite set of alternatives depending on the current (relative) sought rank. We show that “Sesquickselect”, a new Quickselect variant that uses either one or two pivots, makes better use of small samples w.r.t. memory transfers than other Quickselect variants. Copyright© (2019) by SIAM: Society for Industrial and Applied Mathematics.
Genet: a tool for the synthesis and mining of Petri nets
Carmona Vargas, Josep
Cortadella, Jordi
Kishinevsky, Michael
http://hdl.handle.net/2117/133838
2019-07-01T10:45:11Z
2019-06-03T09:56:02Z
Genet: a tool for the synthesis and mining of Petri nets
Carmona Vargas, Josep; Cortadella, Jordi; Kishinevsky, Michael
State-based representations of concurrent systems suffer from the well known state explosion problem. In contrast, Petri nets are good models for this type of systems both in terms of complexity of the analysis and in visualization of the model. In this paper we present Genet, a tool that allows the derivation of a general Petri net from a state-based representation of a system. The tool supports two modes of operation: synthesis and mining. Applications of these two modes range from synthesis of digital systems to business intelligence.
2019-06-03T09:56:02Z
Carmona Vargas, Josep
Cortadella, Jordi
Kishinevsky, Michael
State-based representations of concurrent systems suffer from the well known state explosion problem. In contrast, Petri nets are good models for this type of systems both in terms of complexity of the analysis and in visualization of the model. In this paper we present Genet, a tool that allows the derivation of a general Petri net from a state-based representation of a system. The tool supports two modes of operation: synthesis and mining. Applications of these two modes range from synthesis of digital systems to business intelligence.
Lazy transition systems: application to timing optimization of asynchronous circuits
Cortadella, Jordi
Kishinevsky, Michael
Kondratyev, Alex
Lavagno, Luciano
Taubin, Alexander
Yakovlev, Alex
http://hdl.handle.net/2117/133832
2019-06-04T05:19:33Z
2019-06-03T09:05:16Z
Lazy transition systems: application to timing optimization of asynchronous circuits
Cortadella, Jordi; Kishinevsky, Michael; Kondratyev, Alex; Lavagno, Luciano; Taubin, Alexander; Yakovlev, Alex
The paper introduces Lazy Transitions Systems (LzTSs). The notion of laziness explicitly distinguishes between the enabling and the firing of an event in a transition system. LzTSs can be effectively used to model the behavior of asynchronous circuits in which relative timing assumptions can be made on the occurrence of events. These assumptions can be derived from the information known a priori about the delay of the environment and the timing characteristics of the gates that will implement the circuit. The paper presents necessary conditions to synthesize circuits with a correct behavior under the given timing assumptions. Preliminary results show that significant area and performance improvements can be obtained by exploiting the extra "don't care" space implicitly provided by the laziness of the events.
2019-06-03T09:05:16Z
Cortadella, Jordi
Kishinevsky, Michael
Kondratyev, Alex
Lavagno, Luciano
Taubin, Alexander
Yakovlev, Alex
The paper introduces Lazy Transitions Systems (LzTSs). The notion of laziness explicitly distinguishes between the enabling and the firing of an event in a transition system. LzTSs can be effectively used to model the behavior of asynchronous circuits in which relative timing assumptions can be made on the occurrence of events. These assumptions can be derived from the information known a priori about the delay of the environment and the timing characteristics of the gates that will implement the circuit. The paper presents necessary conditions to synthesize circuits with a correct behavior under the given timing assumptions. Preliminary results show that significant area and performance improvements can be obtained by exploiting the extra "don't care" space implicitly provided by the laziness of the events.
Petrify: a tool for manipulating concurrent specifications and synthesis of asynchronous controllers
Cortadella, Jordi
Kishinevsky, Michael
Kondratyev, Alex
Lavagno, Luciano
Yakovlev, Alex
http://hdl.handle.net/2117/133830
2019-06-04T05:19:32Z
2019-06-03T08:44:03Z
Petrify: a tool for manipulating concurrent specifications and synthesis of asynchronous controllers
Cortadella, Jordi; Kishinevsky, Michael; Kondratyev, Alex; Lavagno, Luciano; Yakovlev, Alex
Petrifyis a tool for (1) manipulating concurrent specifications and (2) synthesis and optimization of asynchronous control circuits. Given a Petri Net (PN), a Signal Transition Graph (STG), or a Transition System (TS)1it (1)generates another PN or STG which is simpler than the original description and (2) produces an optimized net-listof an asynchronous controller in the target gate library while preserving the specified input-output behavior. Given a specification petrify provides a designer with a net-list of anasynchronous circuit and a PN-like description of the circuit behavior in terms of events and ordering relations between events. The latter ability of back-annotating to the specification level helps the designer to control the design process. For transforming a specification petrify performs a token flow analysis of the initial PN and produces a transition system (TS). In the initial TS, all transitions with the same label are considered as one event. The TS is then transformed and transitions relabeled to fulfill the conditions required to obtain a safe irredundant PN. For synthesis of an asynchronous implementation petrify performs state assignment by solving the Complete State Coding problem. State assignment is coupled with logic minimization and speed-independent technology mapping to a target library. The final net-list is guaranteed to be speed-independent, i.e., hazard-free under any distribution of gate delays and multiple input changes satisfying the initial specification. The tool has been used for synthesis of PNs and PNs composition [10], synthesis [7, 8, 9] andre-synthesis [29] of asynchronous controllers and can be also applied in areas related with the analysis of concurrent programs. This paper provides an overview of petrify and the theory behind its main functions.
2019-06-03T08:44:03Z
Cortadella, Jordi
Kishinevsky, Michael
Kondratyev, Alex
Lavagno, Luciano
Yakovlev, Alex
Petrifyis a tool for (1) manipulating concurrent specifications and (2) synthesis and optimization of asynchronous control circuits. Given a Petri Net (PN), a Signal Transition Graph (STG), or a Transition System (TS)1it (1)generates another PN or STG which is simpler than the original description and (2) produces an optimized net-listof an asynchronous controller in the target gate library while preserving the specified input-output behavior. Given a specification petrify provides a designer with a net-list of anasynchronous circuit and a PN-like description of the circuit behavior in terms of events and ordering relations between events. The latter ability of back-annotating to the specification level helps the designer to control the design process. For transforming a specification petrify performs a token flow analysis of the initial PN and produces a transition system (TS). In the initial TS, all transitions with the same label are considered as one event. The TS is then transformed and transitions relabeled to fulfill the conditions required to obtain a safe irredundant PN. For synthesis of an asynchronous implementation petrify performs state assignment by solving the Complete State Coding problem. State assignment is coupled with logic minimization and speed-independent technology mapping to a target library. The final net-list is guaranteed to be speed-independent, i.e., hazard-free under any distribution of gate delays and multiple input changes satisfying the initial specification. The tool has been used for synthesis of PNs and PNs composition [10], synthesis [7, 8, 9] andre-synthesis [29] of asynchronous controllers and can be also applied in areas related with the analysis of concurrent programs. This paper provides an overview of petrify and the theory behind its main functions.
Task generation and compile-time scheduling for mixed data-control embedded software
Cortadella, Jordi
Kondratyev, Alex
Lavagno, Luciano
Massot, Marc
Moral Boadas, Sandra
Passerone, Claudio
Watanabe, Yosinori
Sangiovanni-Vincentelli, Alberto
http://hdl.handle.net/2117/133638
2019-05-30T05:18:49Z
2019-05-29T09:50:36Z
Task generation and compile-time scheduling for mixed data-control embedded software
Cortadella, Jordi; Kondratyev, Alex; Lavagno, Luciano; Massot, Marc; Moral Boadas, Sandra; Passerone, Claudio; Watanabe, Yosinori; Sangiovanni-Vincentelli, Alberto
The problem of optimal software synthesis for concurrent processes to be implemented on a single processor is addressed. The approach calls for the representation of the concurrent processes with Petri nets that give a theoretical foundation for the scheduling algorithm that sequentializes the concurrent processes and for the code generation step. The approach maximizes the amount of static scheduling to reduce the need of context switch and operating system intervention. Experimental results show the potential of our method to reduce software design time and errors.
2019-05-29T09:50:36Z
Cortadella, Jordi
Kondratyev, Alex
Lavagno, Luciano
Massot, Marc
Moral Boadas, Sandra
Passerone, Claudio
Watanabe, Yosinori
Sangiovanni-Vincentelli, Alberto
The problem of optimal software synthesis for concurrent processes to be implemented on a single processor is addressed. The approach calls for the representation of the concurrent processes with Petri nets that give a theoretical foundation for the scheduling algorithm that sequentializes the concurrent processes and for the code generation step. The approach maximizes the amount of static scheduling to reduce the need of context switch and operating system intervention. Experimental results show the potential of our method to reduce software design time and errors.