Ponències/Comunicacions de congressos
http://hdl.handle.net/2117/3095
20190720T19:59:36Z

Exact and heuristic allocation of multikernel applications to multiFPGA platforms
http://hdl.handle.net/2117/166417
Exact and heuristic allocation of multikernel applications to multiFPGA platforms
Shan, Junnan; Casu, Mario R.; Cortadella, Jordi; Lavagno, Luciano; Lazarescu, Mihai T.
FPGAbased accelerators demonstrated high energy efficiency compared to GPUs and CPUs. However, single FPGA designs may not achieve sufficient task parallelism. In this work, we optimize the mapping of highperformance multikernel applications, like Convolutional Neural Networks, to multiFPGA platforms. First, we formulate the system level optimization problem, choosing within a huge design space the parallelism and number of compute units for each kernel in the pipeline. Then we solve it using a combination of Geometric Programming, producing the optimum performance solution given resource and DRAM bandwidth constraints, and a heuristic allocator of the compute units on the FPGA cluster.
20190718T10:24:50Z
Shan, Junnan
Casu, Mario R.
Cortadella, Jordi
Lavagno, Luciano
Lazarescu, Mihai T.
FPGAbased accelerators demonstrated high energy efficiency compared to GPUs and CPUs. However, single FPGA designs may not achieve sufficient task parallelism. In this work, we optimize the mapping of highperformance multikernel applications, like Convolutional Neural Networks, to multiFPGA platforms. First, we formulate the system level optimization problem, choosing within a huge design space the parallelism and number of compute units for each kernel in the pipeline. Then we solve it using a combination of Geometric Programming, producing the optimum performance solution given resource and DRAM bandwidth constraints, and a heuristic allocator of the compute units on the FPGA cluster.

Sesquickselect: One and a half pivots for cacheefficient selection
http://hdl.handle.net/2117/143202
Sesquickselect: One and a half pivots for cacheefficient selection
Martínez Parra, Conrado; Nebel, Markus; Wild, Sebastian
Because of unmatched improvements in CPU performance, memory transfers have become a bottleneck of program execution. As discovered in recent years, this also affects sorting in internal memory. Since partitioning around several pivots reduces overall memory transfers, we have seen renewed interest in multiway Quicksort. Here, we analyze in how far multiway partitioning helps in Quickselect. We compute the expected number of comparisons and scanned elements (approximating memory transfers) for a generic class of (nonadaptive) multiway Quickselect and show that three or more pivots are not helpful, but two pivots are. Moreover, we consider “adaptive” variants which choose partitioning and pivotselection methods in each recursive step from a finite set of alternatives depending on the current (relative) sought rank. We show that “Sesquickselect”, a new Quickselect variant that uses either one or two pivots, makes better use of small samples w.r.t. memory transfers than other Quickselect variants. Copyright© (2019) by SIAM: Society for Industrial and Applied Mathematics.
20190627T07:32:22Z
Martínez Parra, Conrado
Nebel, Markus
Wild, Sebastian
Because of unmatched improvements in CPU performance, memory transfers have become a bottleneck of program execution. As discovered in recent years, this also affects sorting in internal memory. Since partitioning around several pivots reduces overall memory transfers, we have seen renewed interest in multiway Quicksort. Here, we analyze in how far multiway partitioning helps in Quickselect. We compute the expected number of comparisons and scanned elements (approximating memory transfers) for a generic class of (nonadaptive) multiway Quickselect and show that three or more pivots are not helpful, but two pivots are. Moreover, we consider “adaptive” variants which choose partitioning and pivotselection methods in each recursive step from a finite set of alternatives depending on the current (relative) sought rank. We show that “Sesquickselect”, a new Quickselect variant that uses either one or two pivots, makes better use of small samples w.r.t. memory transfers than other Quickselect variants. Copyright© (2019) by SIAM: Society for Industrial and Applied Mathematics.

Genet: a tool for the synthesis and mining of Petri nets
http://hdl.handle.net/2117/133838
Genet: a tool for the synthesis and mining of Petri nets
Carmona Vargas, Josep; Cortadella, Jordi; Kishinevsky, Michael
Statebased representations of concurrent systems suffer from the well known state explosion problem. In contrast, Petri nets are good models for this type of systems both in terms of complexity of the analysis and in visualization of the model. In this paper we present Genet, a tool that allows the derivation of a general Petri net from a statebased representation of a system. The tool supports two modes of operation: synthesis and mining. Applications of these two modes range from synthesis of digital systems to business intelligence.
20190603T09:56:02Z
Carmona Vargas, Josep
Cortadella, Jordi
Kishinevsky, Michael
Statebased representations of concurrent systems suffer from the well known state explosion problem. In contrast, Petri nets are good models for this type of systems both in terms of complexity of the analysis and in visualization of the model. In this paper we present Genet, a tool that allows the derivation of a general Petri net from a statebased representation of a system. The tool supports two modes of operation: synthesis and mining. Applications of these two modes range from synthesis of digital systems to business intelligence.

Lazy transition systems: application to timing optimization of asynchronous circuits
http://hdl.handle.net/2117/133832
Lazy transition systems: application to timing optimization of asynchronous circuits
Cortadella, Jordi; Kishinevsky, Michael; Kondratyev, Alex; Lavagno, Luciano; Taubin, Alexander; Yakovlev, Alex
The paper introduces Lazy Transitions Systems (LzTSs). The notion of laziness explicitly distinguishes between the enabling and the firing of an event in a transition system. LzTSs can be effectively used to model the behavior of asynchronous circuits in which relative timing assumptions can be made on the occurrence of events. These assumptions can be derived from the information known a priori about the delay of the environment and the timing characteristics of the gates that will implement the circuit. The paper presents necessary conditions to synthesize circuits with a correct behavior under the given timing assumptions. Preliminary results show that significant area and performance improvements can be obtained by exploiting the extra "don't care" space implicitly provided by the laziness of the events.
20190603T09:05:16Z
Cortadella, Jordi
Kishinevsky, Michael
Kondratyev, Alex
Lavagno, Luciano
Taubin, Alexander
Yakovlev, Alex
The paper introduces Lazy Transitions Systems (LzTSs). The notion of laziness explicitly distinguishes between the enabling and the firing of an event in a transition system. LzTSs can be effectively used to model the behavior of asynchronous circuits in which relative timing assumptions can be made on the occurrence of events. These assumptions can be derived from the information known a priori about the delay of the environment and the timing characteristics of the gates that will implement the circuit. The paper presents necessary conditions to synthesize circuits with a correct behavior under the given timing assumptions. Preliminary results show that significant area and performance improvements can be obtained by exploiting the extra "don't care" space implicitly provided by the laziness of the events.

Petrify: a tool for manipulating concurrent specifications and synthesis of asynchronous controllers
http://hdl.handle.net/2117/133830
Petrify: a tool for manipulating concurrent specifications and synthesis of asynchronous controllers
Cortadella, Jordi; Kishinevsky, Michael; Kondratyev, Alex; Lavagno, Luciano; Yakovlev, Alex
Petrifyis a tool for (1) manipulating concurrent specifications and (2) synthesis and optimization of asynchronous control circuits. Given a Petri Net (PN), a Signal Transition Graph (STG), or a Transition System (TS)1it (1)generates another PN or STG which is simpler than the original description and (2) produces an optimized netlistof an asynchronous controller in the target gate library while preserving the specified inputoutput behavior. Given a specification petrify provides a designer with a netlist of anasynchronous circuit and a PNlike description of the circuit behavior in terms of events and ordering relations between events. The latter ability of backannotating to the specification level helps the designer to control the design process. For transforming a specification petrify performs a token flow analysis of the initial PN and produces a transition system (TS). In the initial TS, all transitions with the same label are considered as one event. The TS is then transformed and transitions relabeled to fulfill the conditions required to obtain a safe irredundant PN. For synthesis of an asynchronous implementation petrify performs state assignment by solving the Complete State Coding problem. State assignment is coupled with logic minimization and speedindependent technology mapping to a target library. The final netlist is guaranteed to be speedindependent, i.e., hazardfree under any distribution of gate delays and multiple input changes satisfying the initial specification. The tool has been used for synthesis of PNs and PNs composition [10], synthesis [7, 8, 9] andresynthesis [29] of asynchronous controllers and can be also applied in areas related with the analysis of concurrent programs. This paper provides an overview of petrify and the theory behind its main functions.
20190603T08:44:03Z
Cortadella, Jordi
Kishinevsky, Michael
Kondratyev, Alex
Lavagno, Luciano
Yakovlev, Alex
Petrifyis a tool for (1) manipulating concurrent specifications and (2) synthesis and optimization of asynchronous control circuits. Given a Petri Net (PN), a Signal Transition Graph (STG), or a Transition System (TS)1it (1)generates another PN or STG which is simpler than the original description and (2) produces an optimized netlistof an asynchronous controller in the target gate library while preserving the specified inputoutput behavior. Given a specification petrify provides a designer with a netlist of anasynchronous circuit and a PNlike description of the circuit behavior in terms of events and ordering relations between events. The latter ability of backannotating to the specification level helps the designer to control the design process. For transforming a specification petrify performs a token flow analysis of the initial PN and produces a transition system (TS). In the initial TS, all transitions with the same label are considered as one event. The TS is then transformed and transitions relabeled to fulfill the conditions required to obtain a safe irredundant PN. For synthesis of an asynchronous implementation petrify performs state assignment by solving the Complete State Coding problem. State assignment is coupled with logic minimization and speedindependent technology mapping to a target library. The final netlist is guaranteed to be speedindependent, i.e., hazardfree under any distribution of gate delays and multiple input changes satisfying the initial specification. The tool has been used for synthesis of PNs and PNs composition [10], synthesis [7, 8, 9] andresynthesis [29] of asynchronous controllers and can be also applied in areas related with the analysis of concurrent programs. This paper provides an overview of petrify and the theory behind its main functions.

Task generation and compiletime scheduling for mixed datacontrol embedded software
http://hdl.handle.net/2117/133638
Task generation and compiletime scheduling for mixed datacontrol embedded software
Cortadella, Jordi; Kondratyev, Alex; Lavagno, Luciano; Massot, Marc; Moral Boadas, Sandra; Passerone, Claudio; Watanabe, Yosinori; SangiovanniVincentelli, Alberto
The problem of optimal software synthesis for concurrent processes to be implemented on a single processor is addressed. The approach calls for the representation of the concurrent processes with Petri nets that give a theoretical foundation for the scheduling algorithm that sequentializes the concurrent processes and for the code generation step. The approach maximizes the amount of static scheduling to reduce the need of context switch and operating system intervention. Experimental results show the potential of our method to reduce software design time and errors.
20190529T09:50:36Z
Cortadella, Jordi
Kondratyev, Alex
Lavagno, Luciano
Massot, Marc
Moral Boadas, Sandra
Passerone, Claudio
Watanabe, Yosinori
SangiovanniVincentelli, Alberto
The problem of optimal software synthesis for concurrent processes to be implemented on a single processor is addressed. The approach calls for the representation of the concurrent processes with Petri nets that give a theoretical foundation for the scheduling algorithm that sequentializes the concurrent processes and for the code generation step. The approach maximizes the amount of static scheduling to reduce the need of context switch and operating system intervention. Experimental results show the potential of our method to reduce software design time and errors.

Verification of asynchronous circuits by BDDbased model checking of Petri nets
http://hdl.handle.net/2117/133572
Verification of asynchronous circuits by BDDbased model checking of Petri nets
Roig Mansilla, Oriol; Cortadella, Jordi; Pastor Llorens, Enric
This paper presents a methodology for the verification of speedindependent asynchronous circuits against a Petri net specification. The technique is based on symbolic reachability analysis, modeling both the specification and the gatelevel network behavior by means of boolean functions. These functions are efficiently handled by using Binary Decision Diagrams. Algorithms for verifying the correctness of designs, as well as several circuit properties are proposed. Finally, the applicability of our verification method has been proven by checking the correctness of different benchmarks.
20190528T11:21:20Z
Roig Mansilla, Oriol
Cortadella, Jordi
Pastor Llorens, Enric
This paper presents a methodology for the verification of speedindependent asynchronous circuits against a Petri net specification. The technique is based on symbolic reachability analysis, modeling both the specification and the gatelevel network behavior by means of boolean functions. These functions are efficiently handled by using Binary Decision Diagrams. Algorithms for verifying the correctness of designs, as well as several circuit properties are proposed. Finally, the applicability of our verification method has been proven by checking the correctness of different benchmarks.

Exploiting the locality of memory references to reduce the address bus energy
http://hdl.handle.net/2117/133568
Exploiting the locality of memory references to reduce the address bus energy
Musoll Cinca, Enric; Lang, Tomás; Cortadella, Jordi
The energy consumption at the I/O pins is a significant part of the overall chip consumption. This paper presents a method for encoding an external address bus which lowers its activity and, thus, decreases the energy. This method relies on the locality of memory references. Since applications favor a few working zones of their address space at each instant, for an address to one of these zones only the offset of this reference with respect to the previous reference to that zone needs to be sent over the bus, along with an identifier of the current working zone. This is combined with a modified oneshot encoding for the offset. An estimate of the area and energy overhead of the encoder/decoder are given; their effect is small. The approach has been applied to two memoryintensive examples, obtaining a busactivity reduction of about 2/3 in both of them. Comparisons are given with previous methods for bus encoding, showing significant improvement.
20190528T09:38:05Z
Musoll Cinca, Enric
Lang, Tomás
Cortadella, Jordi
The energy consumption at the I/O pins is a significant part of the overall chip consumption. This paper presents a method for encoding an external address bus which lowers its activity and, thus, decreases the energy. This method relies on the locality of memory references. Since applications favor a few working zones of their address space at each instant, for an address to one of these zones only the offset of this reference with respect to the previous reference to that zone needs to be sent over the bus, along with an identifier of the current working zone. This is combined with a modified oneshot encoding for the offset. An estimate of the area and energy overhead of the encoder/decoder are given; their effect is small. The approach has been applied to two memoryintensive examples, obtaining a busactivity reduction of about 2/3 in both of them. Comparisons are given with previous methods for bus encoding, showing significant improvement.

Synthesis of synchronous elastic architectures
http://hdl.handle.net/2117/133566
Synthesis of synchronous elastic architectures
Cortadella, Jordi; Kishinevsky, Michael; Grundmann, Bill
A simple protocol for latencyinsensitive design is presented. The main features of the protocol are the efficient implementation of elastic communication channels and the automatable design methodology. With this approach, finegranularity elasticity can be introduced at the level of functional units (e.g. ALUs, memories). A formal specification of the protocol is defined and an efficient scheme for the implementation of elasticity that involves no datapath overhead is presented. The opportunities this protocol opens for microarchitectural design are discussed.
20190528T09:16:22Z
Cortadella, Jordi
Kishinevsky, Michael
Grundmann, Bill
A simple protocol for latencyinsensitive design is presented. The main features of the protocol are the efficient implementation of elastic communication channels and the automatable design methodology. With this approach, finegranularity elasticity can be introduced at the level of functional units (e.g. ALUs, memories). A formal specification of the protocol is defined and an efficient scheme for the implementation of elasticity that involves no datapath overhead is presented. The opportunities this protocol opens for microarchitectural design are discussed.

Automating synthesis of asynchronous communication mechanisms
http://hdl.handle.net/2117/133459
Automating synthesis of asynchronous communication mechanisms
Cortadella, Jordi; Costa Gorgônio, Kyller; Xia, Fei; Yakovlev, Alex
Asynchronous data communication mechanisms (ACMs) have been extensively studied as data connectors between independently timed processes in digital systems. In previous work, systematic ACM synthesis methods have been proposed. In this paper, we advance this work by developing algorithms and software tools which automate the major part of the ACM synthesis process. Firstly, an interleaving specification is constructed in the form of a state graph, and secondly, a Petri net model of an "ACMtype" is derived using the notion of an ACMregion. The method is applied to a number of "standard" writing and reading policies of ACMs with shared memory and unidirectional control variables.
20190524T11:33:20Z
Cortadella, Jordi
Costa Gorgônio, Kyller
Xia, Fei
Yakovlev, Alex
Asynchronous data communication mechanisms (ACMs) have been extensively studied as data connectors between independently timed processes in digital systems. In previous work, systematic ACM synthesis methods have been proposed. In this paper, we advance this work by developing algorithms and software tools which automate the major part of the ACM synthesis process. Firstly, an interleaving specification is constructed in the form of a state graph, and secondly, a Petri net model of an "ACMtype" is derived using the notion of an ACMregion. The method is applied to a number of "standard" writing and reading policies of ACMs with shared memory and unidirectional control variables.