Capítols de llibrehttp://hdl.handle.net/2117/805322024-03-29T01:40:51Z2024-03-29T01:40:51ZChallenges and opportunities for RISC-V architectures towards genomics-based workloadsGómez Sánchez, GonzaloCall Barreiro, AaronTeruel García, XavierAlonso Parrilla, LorenaMorán Castany, IgnasiPérez Elena, Miguel ÁngelTorrents Arenales, DavidBerral García, Josep Lluíshttp://hdl.handle.net/2117/3960682023-11-19T02:01:33Z2023-11-09T09:37:31ZChallenges and opportunities for RISC-V architectures towards genomics-based workloads
Gómez Sánchez, Gonzalo; Call Barreiro, Aaron; Teruel García, Xavier; Alonso Parrilla, Lorena; Morán Castany, Ignasi; Pérez Elena, Miguel Ángel; Torrents Arenales, David; Berral García, Josep Lluís
The use of large-scale supercomputing architectures is a hard requirement for scientific computing Big-Data applications. An example is genomics analytics, where millions of data transformations and tests per patient need to be done to find relevant clinical indicators. Therefore, to ensure open and broad access to high-performance technologies, governments, and academia are pushing toward the introduction of novel computing architectures in large-scale scientific environments. This is the case of RISC-V, an open-source and royalty-free instruction-set architecture. To evaluate such technologies, here we present the Variant-Interaction Analytics use case benchmarking suite and datasets. Through this use case, we search for possible genetic interactions using computational and statistical methods, providing a representative case for heavy ETL (Extract, Transform, Load) data processing. Current implementations are implemented in x86-based supercomputers (e.g. MareNostrum-IV at the Barcelona Supercomputing Center (BSC)), and future steps propose RISC-V as part of the next MareNostrum generations. Here we describe the Variant Interaction Use Case, highlighting the characteristics leveraging high-performance computing, indicating the caveats and challenges towards the next RISC-V developments and designs to come from a first comparison between x86 and RISC-V architectures on real Variant Interaction executions over real hardware implementations.
2023-11-09T09:37:31ZGómez Sánchez, GonzaloCall Barreiro, AaronTeruel García, XavierAlonso Parrilla, LorenaMorán Castany, IgnasiPérez Elena, Miguel ÁngelTorrents Arenales, DavidBerral García, Josep LluísThe use of large-scale supercomputing architectures is a hard requirement for scientific computing Big-Data applications. An example is genomics analytics, where millions of data transformations and tests per patient need to be done to find relevant clinical indicators. Therefore, to ensure open and broad access to high-performance technologies, governments, and academia are pushing toward the introduction of novel computing architectures in large-scale scientific environments. This is the case of RISC-V, an open-source and royalty-free instruction-set architecture. To evaluate such technologies, here we present the Variant-Interaction Analytics use case benchmarking suite and datasets. Through this use case, we search for possible genetic interactions using computational and statistical methods, providing a representative case for heavy ETL (Extract, Transform, Load) data processing. Current implementations are implemented in x86-based supercomputers (e.g. MareNostrum-IV at the Barcelona Supercomputing Center (BSC)), and future steps propose RISC-V as part of the next MareNostrum generations. Here we describe the Variant Interaction Use Case, highlighting the characteristics leveraging high-performance computing, indicating the caveats and challenges towards the next RISC-V developments and designs to come from a first comparison between x86 and RISC-V architectures on real Variant Interaction executions over real hardware implementations.The DeepHealth Toolkit: A key European free and open-source software for deep learning and computer vision ready to exploit heterogeneous HPC and cloud architecturesAldinucci, MarcoAtienza, DavidBolelli, FedericoCaballero, MónicaColonnelli, IacopoQuiñones, Eduardohttp://hdl.handle.net/2117/3670662022-05-09T07:57:09Z2022-05-09T07:48:21ZThe DeepHealth Toolkit: A key European free and open-source software for deep learning and computer vision ready to exploit heterogeneous HPC and cloud architectures
Aldinucci, Marco; Atienza, David; Bolelli, Federico; Caballero, Mónica; Colonnelli, Iacopo; Quiñones, Eduardo
At the present time, we are immersed in the convergence between Big Data, High-Performance Computing and Artificial Intelligence. Technological progress in these three areas has accelerated in recent years, forcing different players like software companies and stakeholders to move quickly. The European Union is dedicating a lot of resources to maintain its relevant position in this scenario, funding projects to implement large-scale pilot testbeds that combine the latest advances in Artificial Intelligence, High-Performance Computing, Cloud and Big Data technologies. The DeepHealth project is an example focused on the health sector whose main outcome is the DeepHealth toolkit, a European unified framework that offers deep learning and computer vision capabilities, completely adapted to exploit underlying heterogeneous High-Performance Computing, Big Data and cloud architectures, and ready to be integrated into any software platform to facilitate the development and deployment of new applications for specific problems in any sector. This toolkit is intended to be one of the European contributions to the field of AI. This chapter introduces the toolkit with its main components and complementary tools, providing a clear view to facilitate and encourage its adoption and wide use by the European community of developers of AI-based solutions and data scientists working in the healthcare sector and others.
i
2022-05-09T07:48:21ZAldinucci, MarcoAtienza, DavidBolelli, FedericoCaballero, MónicaColonnelli, IacopoQuiñones, EduardoAt the present time, we are immersed in the convergence between Big Data, High-Performance Computing and Artificial Intelligence. Technological progress in these three areas has accelerated in recent years, forcing different players like software companies and stakeholders to move quickly. The European Union is dedicating a lot of resources to maintain its relevant position in this scenario, funding projects to implement large-scale pilot testbeds that combine the latest advances in Artificial Intelligence, High-Performance Computing, Cloud and Big Data technologies. The DeepHealth project is an example focused on the health sector whose main outcome is the DeepHealth toolkit, a European unified framework that offers deep learning and computer vision capabilities, completely adapted to exploit underlying heterogeneous High-Performance Computing, Big Data and cloud architectures, and ready to be integrated into any software platform to facilitate the development and deployment of new applications for specific problems in any sector. This toolkit is intended to be one of the European contributions to the field of AI. This chapter introduces the toolkit with its main components and complementary tools, providing a clear view to facilitate and encourage its adoption and wide use by the European community of developers of AI-based solutions and data scientists working in the healthcare sector and others.
iA quaternion deterministic monogenic CNN layer for contrast invarianceMoya Sánchez, Eduardo UlisesXambó Descamps, SebastiánSalazar Colores, SebastiánSánchez-Pérez, AbrahamCortés García, Claudio Uliseshttp://hdl.handle.net/2117/3497172023-11-19T03:59:00Z2021-07-20T08:29:29ZA quaternion deterministic monogenic CNN layer for contrast invariance
Moya Sánchez, Eduardo Ulises; Xambó Descamps, Sebastián; Salazar Colores, Sebastián; Sánchez-Pérez, Abraham; Cortés García, Claudio Ulises
Deep learning (DL) is attracting considerable interest as it currently achieves remarkable performance in many branches of science and technology. However, current DL cannot guarantee capabilities of the mammalian visual systems such as lighting changes. This paper proposes a deterministic entry layer capable of classifying images even with low-contrast conditions. We achieve this through an improved version of the quaternion monogenic wavelets. We have simulated the atmospheric degradation of the CIFAR-10 and the Dogs and Cats datasets to generate realistic contrast degradations of the images. The most important result is that the accuracy gained by using our layer is substantially more robust to illumination changes than nets without such a layer.
2021-07-20T08:29:29ZMoya Sánchez, Eduardo UlisesXambó Descamps, SebastiánSalazar Colores, SebastiánSánchez-Pérez, AbrahamCortés García, Claudio UlisesDeep learning (DL) is attracting considerable interest as it currently achieves remarkable performance in many branches of science and technology. However, current DL cannot guarantee capabilities of the mammalian visual systems such as lighting changes. This paper proposes a deterministic entry layer capable of classifying images even with low-contrast conditions. We achieve this through an improved version of the quaternion monogenic wavelets. We have simulated the atmospheric degradation of the CIFAR-10 and the Dogs and Cats datasets to generate realistic contrast degradations of the images. The most important result is that the accuracy gained by using our layer is substantially more robust to illumination changes than nets without such a layer.Managing failures in task-based parallel workflows in distributed computing environmentsEjarque, JorgeBertran, MartaÁlvarez Cid-Fuentes, JavierConejero, JavierBadia Sala, Rosa Mariahttp://hdl.handle.net/2117/3283122021-02-04T09:02:48Z2020-09-02T16:26:34ZManaging failures in task-based parallel workflows in distributed computing environments
Ejarque, Jorge; Bertran, Marta; Álvarez Cid-Fuentes, Javier; Conejero, Javier; Badia Sala, Rosa Maria
Current scientific workflows are large and complex. They normally perform thousands of simulations whose results combined with searching and data analytics algorithms, in order to infer new knowledge, generate a very large amount of data. To this end, workflows comprise many tasks and some of them may fail. Most of the work done about failure management in workflow managers and runtimes focuses on recovering from failures caused by resources (retrying or resubmitting the failed computation in other resources, etc.) However, some of these failures can be caused by the application itself (corrupted data, algorithms which are not converging for certain conditions, etc.), and these fault tolerance mechanisms are not sufficient to perform a successful workflow execution. In these cases, developers have to add some code in their applications to prevent and manage the possible failures. In this paper, we propose a simple interface and a set of transparent runtime mechanisms to simplify how scientists deal with application-based failures in task-based parallel workflows. We have validated our proposal with use-cases from e-science and machine learning to show the benefits of the proposed interface and mechanisms in terms of programming productivity and performance.
2020-09-02T16:26:34ZEjarque, JorgeBertran, MartaÁlvarez Cid-Fuentes, JavierConejero, JavierBadia Sala, Rosa MariaCurrent scientific workflows are large and complex. They normally perform thousands of simulations whose results combined with searching and data analytics algorithms, in order to infer new knowledge, generate a very large amount of data. To this end, workflows comprise many tasks and some of them may fail. Most of the work done about failure management in workflow managers and runtimes focuses on recovering from failures caused by resources (retrying or resubmitting the failed computation in other resources, etc.) However, some of these failures can be caused by the application itself (corrupted data, algorithms which are not converging for certain conditions, etc.), and these fault tolerance mechanisms are not sufficient to perform a successful workflow execution. In these cases, developers have to add some code in their applications to prevent and manage the possible failures. In this paper, we propose a simple interface and a set of transparent runtime mechanisms to simplify how scientists deal with application-based failures in task-based parallel workflows. We have validated our proposal with use-cases from e-science and machine learning to show the benefits of the proposed interface and mechanisms in terms of programming productivity and performance.Predictable parallel programming with OpenMPSerrano Gracia, María AstónRoyuela Alcázar, SaraMarongiu, AndreaQuiñones Moreno, Eduardohttp://hdl.handle.net/2117/1913792022-05-22T06:21:39Z2020-06-23T10:19:41ZPredictable parallel programming with OpenMP
Serrano Gracia, María Astón; Royuela Alcázar, Sara; Marongiu, Andrea; Quiñones Moreno, Eduardo
This chapter motivates the use of the OpenMP (Open Multi-Processing) parallel programming model to develop future critical real-time embedded systems, and analyzes the time-predictable properties of the OpenMP tasking model. Moreover, this chapter presents the set of compiler techniques needed to extract the timing information of an OpenMP program in the form of an OpenMP Direct Acyclic Graph or OpenMP-DAG.
2020-06-23T10:19:41ZSerrano Gracia, María AstónRoyuela Alcázar, SaraMarongiu, AndreaQuiñones Moreno, EduardoThis chapter motivates the use of the OpenMP (Open Multi-Processing) parallel programming model to develop future critical real-time embedded systems, and analyzes the time-predictable properties of the OpenMP tasking model. Moreover, this chapter presents the set of compiler techniques needed to extract the timing information of an OpenMP program in the form of an OpenMP Direct Acyclic Graph or OpenMP-DAG.YOLO: Speeding Up VM and Docker Boot Time by Reducing I/O OperationsNguyen, Thuy L.Nou Castell, RamonLebre, Adrienhttp://hdl.handle.net/2117/1683052020-07-23T23:28:44Z2019-09-17T10:05:19ZYOLO: Speeding Up VM and Docker Boot Time by Reducing I/O Operations
Nguyen, Thuy L.; Nou Castell, Ramon; Lebre, Adrien
Although this comes as a surprise, the time to boot a Docker-based container can last as long as a virtual machine in high consolidated cloud scenarios. Because this time is critical as boot duration defines how an application can react w.r.t. demands’ fluctuations (horizontal elasticity), we present in this paper the YOLO mechanism (You Only Load Once). YOLO reduces the number of I/O operations generated during a boot process by relying on a boot image abstraction, a subset of the VM/container image that contains data blocks necessary to complete the boot operation. Whenever a VM or a container is booted, YOLO intercepts all read accesses and serves them directly from the boot image, which has been locally stored on fast access storage devices (e.g., memory, SSD, etc.). In addition to YOLO, we show that another mechanism is required to ensure that files related to VM/container management systems remain in the cache of the host OS. Our results show that the use of these two techniques can speed up the boot duration 2–13 times for VMs and 2 times for containers. The benefit on containers is limited due to internal choices of the docker design. We underline that our proposal can be easily applied to other types of virtualization (e.g., Xen) and containerization because it does not require intrusive modifications on the virtualization/container management system nor the base image structure.
2019-09-17T10:05:19ZNguyen, Thuy L.Nou Castell, RamonLebre, AdrienAlthough this comes as a surprise, the time to boot a Docker-based container can last as long as a virtual machine in high consolidated cloud scenarios. Because this time is critical as boot duration defines how an application can react w.r.t. demands’ fluctuations (horizontal elasticity), we present in this paper the YOLO mechanism (You Only Load Once). YOLO reduces the number of I/O operations generated during a boot process by relying on a boot image abstraction, a subset of the VM/container image that contains data blocks necessary to complete the boot operation. Whenever a VM or a container is booted, YOLO intercepts all read accesses and serves them directly from the boot image, which has been locally stored on fast access storage devices (e.g., memory, SSD, etc.). In addition to YOLO, we show that another mechanism is required to ensure that files related to VM/container management systems remain in the cache of the host OS. Our results show that the use of these two techniques can speed up the boot duration 2–13 times for VMs and 2 times for containers. The benefit on containers is limited due to internal choices of the docker design. We underline that our proposal can be easily applied to other types of virtualization (e.g., Xen) and containerization because it does not require intrusive modifications on the virtualization/container management system nor the base image structure.Towards an energy-aware framework for application development and execution in heterogeneous parallel architecturesDjemame, KarimKavanagh, RichardKelefouras, VasiliosAguilà, AdriàEjarque, JorgeBadia Sala, Rosa MariaGarcía-Pérez, DavidPezuela, ClaraDeprez, Jean-ChristopheGuedria, LoftiDe Landtsheer, RenaudGeorgiou, Yiannishttp://hdl.handle.net/2117/1336852022-05-17T10:08:59Z2019-05-30T09:13:22ZTowards an energy-aware framework for application development and execution in heterogeneous parallel architectures
Djemame, Karim; Kavanagh, Richard; Kelefouras, Vasilios; Aguilà, Adrià; Ejarque, Jorge; Badia Sala, Rosa Maria; García-Pérez, David; Pezuela, Clara; Deprez, Jean-Christophe; Guedria, Lofti; De Landtsheer, Renaud; Georgiou, Yiannis
The Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation (TANGO) project’s goal is to characterise factors which affect power consumption in software development and operation for Heterogeneous Parallel Hardware (HPA) environments. Its main contribution is the combination of requirements engineering and design modelling for self-adaptive software systems, with power consumption awareness in relation to these environments. The energy efficiency and application quality factors are integrated into the application lifecycle (design, implementation and operation). To support this, the key novelty of the project is a reference architecture and its implementation. Moreover, a programming model with built-in support for various hardware architectures including heterogeneous clusters, heterogeneous chips and programmable logic devices is provided. This leads to a new cross-layer programming approach for heterogeneous parallel hardware architectures featuring software and hardware modelling. Application power consumption and performance, data location and time-criticality optimization, as well as security and dependability requirements on the target hardware architecture are supported by the architecture.
2019-05-30T09:13:22ZDjemame, KarimKavanagh, RichardKelefouras, VasiliosAguilà, AdriàEjarque, JorgeBadia Sala, Rosa MariaGarcía-Pérez, DavidPezuela, ClaraDeprez, Jean-ChristopheGuedria, LoftiDe Landtsheer, RenaudGeorgiou, YiannisThe Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation (TANGO) project’s goal is to characterise factors which affect power consumption in software development and operation for Heterogeneous Parallel Hardware (HPA) environments. Its main contribution is the combination of requirements engineering and design modelling for self-adaptive software systems, with power consumption awareness in relation to these environments. The energy efficiency and application quality factors are integrated into the application lifecycle (design, implementation and operation). To support this, the key novelty of the project is a reference architecture and its implementation. Moreover, a programming model with built-in support for various hardware architectures including heterogeneous clusters, heterogeneous chips and programmable logic devices is provided. This leads to a new cross-layer programming approach for heterogeneous parallel hardware architectures featuring software and hardware modelling. Application power consumption and performance, data location and time-criticality optimization, as well as security and dependability requirements on the target hardware architecture are supported by the architecture.dReDBox: A Disaggregated Architectural Perspective for Data CentersAlachiotis, NikolaosAndronikakis, AndreasPapadakis, OrionTheodoropoulos, DimitrisPnevmatikatos, DionisiosSyrivelis, DimitrisReale, AndreaKatrinis, KostasZervas, GeorgeMishra, VaibhawaYuan, HuiSyrigos, IliasIgoumenos, IoannisKorakis, ThanasisTorrents, MartiZyulkyarov, Feradhttp://hdl.handle.net/2117/1326962020-07-23T23:28:46Z2019-05-08T09:09:39ZdReDBox: A Disaggregated Architectural Perspective for Data Centers
Alachiotis, Nikolaos; Andronikakis, Andreas; Papadakis, Orion; Theodoropoulos, Dimitris; Pnevmatikatos, Dionisios; Syrivelis, Dimitris; Reale, Andrea; Katrinis, Kostas; Zervas, George; Mishra, Vaibhawa; Yuan, Hui; Syrigos, Ilias; Igoumenos, Ioannis; Korakis, Thanasis; Torrents, Marti; Zyulkyarov, Ferad
Data centers are currently constructed with fixed blocks (blades); the hard boundaries of this approach lead to suboptimal utilization of resources and increased energy requirements. The dReDBox (disaggregated Recursive Datacenter in a Box) project addresses the problem of fixed resource proportionality in next-generation, low-power data centers by proposing a paradigm shift toward finer resource allocation granularity, where the unit is the function block rather than the mainboard tray. This introduces various challenges at the system design level, requiring elastic hardware architectures, efficient software support and management, and programmable interconnect. Memory and hardware accelerators can be dynamically assigned to processing units to boost application performance, while high-speed, low-latency electrical and optical interconnect is a prerequisite for realizing the concept of data center disaggregation. This chapter presents the dReDBox hardware architecture and discusses design aspects of the software infrastructure for resource allocation and management. Furthermore, initial simulation and evaluation results for accessing remote, disaggregated memory are presented, employing benchmarks from the Splash-3 and the CloudSuite benchmark suites.
2019-05-08T09:09:39ZAlachiotis, NikolaosAndronikakis, AndreasPapadakis, OrionTheodoropoulos, DimitrisPnevmatikatos, DionisiosSyrivelis, DimitrisReale, AndreaKatrinis, KostasZervas, GeorgeMishra, VaibhawaYuan, HuiSyrigos, IliasIgoumenos, IoannisKorakis, ThanasisTorrents, MartiZyulkyarov, FeradData centers are currently constructed with fixed blocks (blades); the hard boundaries of this approach lead to suboptimal utilization of resources and increased energy requirements. The dReDBox (disaggregated Recursive Datacenter in a Box) project addresses the problem of fixed resource proportionality in next-generation, low-power data centers by proposing a paradigm shift toward finer resource allocation granularity, where the unit is the function block rather than the mainboard tray. This introduces various challenges at the system design level, requiring elastic hardware architectures, efficient software support and management, and programmable interconnect. Memory and hardware accelerators can be dynamically assigned to processing units to boost application performance, while high-speed, low-latency electrical and optical interconnect is a prerequisite for realizing the concept of data center disaggregation. This chapter presents the dReDBox hardware architecture and discusses design aspects of the software infrastructure for resource allocation and management. Furthermore, initial simulation and evaluation results for accessing remote, disaggregated memory are presented, employing benchmarks from the Splash-3 and the CloudSuite benchmark suites.Towards an OpenMP Specification for Critical Real-Time SystemsSerrano Gracia, María AstónRoyuela Alcázar, SaraQuiñones Moreno, Eduardohttp://hdl.handle.net/2117/1251312022-05-22T08:54:15Z2018-11-27T12:53:22ZTowards an OpenMP Specification for Critical Real-Time Systems
Serrano Gracia, María Astón; Royuela Alcázar, Sara; Quiñones Moreno, Eduardo
OpenMP is increasingly being considered as a convenient parallel programming model to cope with the performance requirements of critical real-time systems. Recent works demonstrate that OpenMP enables to derive guarantees on the functional and timing behavior of the system, a fundamental requirement of such systems. These works, however, focus only on the exploitation of fine grain parallelism and do not take into account the peculiarities of critical real-time systems, commonly composed of a set of concurrent functionalities. OpenMP allows exploiting the parallelism exposed within real-time tasks and among them. This paper analyzes the challenges of combining the concurrency model of real-time tasks with the parallel model of OpenMP. We demonstrate that OpenMP is suitable to develop advanced critical real-time systems by virtue of few changes on the specification, which allow the scheduling behavior desired (regarding execution priorities, preemption, migration and allocation strategies) in such systems.
2018-11-27T12:53:22ZSerrano Gracia, María AstónRoyuela Alcázar, SaraQuiñones Moreno, EduardoOpenMP is increasingly being considered as a convenient parallel programming model to cope with the performance requirements of critical real-time systems. Recent works demonstrate that OpenMP enables to derive guarantees on the functional and timing behavior of the system, a fundamental requirement of such systems. These works, however, focus only on the exploitation of fine grain parallelism and do not take into account the peculiarities of critical real-time systems, commonly composed of a set of concurrent functionalities. OpenMP allows exploiting the parallelism exposed within real-time tasks and among them. This paper analyzes the challenges of combining the concurrency model of real-time tasks with the parallel model of OpenMP. We demonstrate that OpenMP is suitable to develop advanced critical real-time systems by virtue of few changes on the specification, which allow the scheduling behavior desired (regarding execution priorities, preemption, migration and allocation strategies) in such systems.Implementation of the K-Means Algorithm on Heterogeneous Devices: A Use Case Based on an Industrial DatasetXu, Ying haoVidal, MiquelArejita, BeñatDiaz, JavierAlvarez, CarlosJiménez González, DanielMartorell Bofill, XavierMantovani, Filippohttp://hdl.handle.net/2117/1148422022-06-19T01:31:32Z2018-03-06T09:36:26ZImplementation of the K-Means Algorithm on Heterogeneous Devices: A Use Case Based on an Industrial Dataset
Xu, Ying hao; Vidal, Miquel; Arejita, Beñat; Diaz, Javier; Alvarez, Carlos; Jiménez González, Daniel; Martorell Bofill, Xavier; Mantovani, Filippo
This paper presents and analyzes a heterogeneous implementation of an industrial use case based on K-means that targets symmetric multiprocessing (SMP), GPUs and FPGAs. We present how the application can be optimized from an algorithmic point of view and how this optimization performs on two heterogeneous platforms. The presented implementation relies on the OmpSs programming model, which introduces a simplified pragma-based syntax for the communication between the main processor and the accelerators. Performance improvement can be achieved by the programmer explicitly specifying the data memory accesses or copies. As expected, the newer SMP+GPU system studied is more powerful than the older SMP+FPGA system. However the latter is enough to fulfill the requirements of our use case and we show that uses less energy when considering only the active power of the execution.
2018-03-06T09:36:26ZXu, Ying haoVidal, MiquelArejita, BeñatDiaz, JavierAlvarez, CarlosJiménez González, DanielMartorell Bofill, XavierMantovani, FilippoThis paper presents and analyzes a heterogeneous implementation of an industrial use case based on K-means that targets symmetric multiprocessing (SMP), GPUs and FPGAs. We present how the application can be optimized from an algorithmic point of view and how this optimization performs on two heterogeneous platforms. The presented implementation relies on the OmpSs programming model, which introduces a simplified pragma-based syntax for the communication between the main processor and the accelerators. Performance improvement can be achieved by the programmer explicitly specifying the data memory accesses or copies. As expected, the newer SMP+GPU system studied is more powerful than the older SMP+FPGA system. However the latter is enough to fulfill the requirements of our use case and we show that uses less energy when considering only the active power of the execution.