Articles de revista
http://hdl.handle.net/2117/3124
20160503T15:04:32Z

Integrated policy management framework for IaaS cloud middleware
http://hdl.handle.net/2117/86120
Integrated policy management framework for IaaS cloud middleware
Canuto, Mauro; Guitart Fernández, Jordi
As a result of the rapid growth of Cloud Computing, several Cloud middleware solutions for the creation and automated management of virtual appliances have been released. Generally, these solutions offer some predefined scheduling policies to manage the provider infrastructure, but additional tuning of the policies is often needed to fully align their behavior with the provider interests. However, current middleware solutions do not offer ways to do this without stopping and recompiling the middleware. This paper proposes a solution that separates scheduling policies from the managers that interpret them, to allow the behavior of the management system to be changed without recoding the managers. In this way, the middleware can adapt to changing requirements by disabling policies or replacing old policies with new ones without shutting down the system. We propose a new policy language for the definition of management policies and we enable the EMOTIVE Cloud middleware to use these policies by integrating in the middleware the needed policy management framework for parsing and generating code on demand. We demonstrate with real experiments that our policy management framework mimics the expressiveness of scheduling policies in real Cloud middleware and provides more expressiveness if needed. The overhead of the policy management framework is low, but its performance degrades, especially in large datacenters, due to the low scalability of the EMOTIVE monitoring solution.
20160425T06:50:50Z
Canuto, Mauro
Guitart Fernández, Jordi
As a result of the rapid growth of Cloud Computing, several Cloud middleware solutions for the creation and automated management of virtual appliances have been released. Generally, these solutions offer some predefined scheduling policies to manage the provider infrastructure, but additional tuning of the policies is often needed to fully align their behavior with the provider interests. However, current middleware solutions do not offer ways to do this without stopping and recompiling the middleware. This paper proposes a solution that separates scheduling policies from the managers that interpret them, to allow the behavior of the management system to be changed without recoding the managers. In this way, the middleware can adapt to changing requirements by disabling policies or replacing old policies with new ones without shutting down the system. We propose a new policy language for the definition of management policies and we enable the EMOTIVE Cloud middleware to use these policies by integrating in the middleware the needed policy management framework for parsing and generating code on demand. We demonstrate with real experiments that our policy management framework mimics the expressiveness of scheduling policies in real Cloud middleware and provides more expressiveness if needed. The overhead of the policy management framework is low, but its performance degrades, especially in large datacenters, due to the low scalability of the EMOTIVE monitoring solution.

Compact finite difference modeling of 2D acoustic wave propagation
http://hdl.handle.net/2117/85841
Compact finite difference modeling of 2D acoustic wave propagation
Córdova, Luis; Rojas, Otilio; Otero Calviño, Beatriz; Castillo, Jose
We present two fourthorder compact finite difference (CFD) discretizations of the velocity–pressure formulation of the acoustic wave equation in 2D rectangular grids. The first method uses standard implicit CFD on nodal meshes and requires solving tridiagonal linear systems along each grid line, while the second scheme employs a novel set of mimetic CFD operators for explicit differentiation on staggered grids. Both schemes share a Crank–Nicolson time integration decoupled by the Peaceman–Rachford splitting technique to update discrete fields by alternating the coordinate direction of CFD differentiation (ADIlike iterations). For comparison purposes, we also implement a spatially fourthorder FD scheme using non compact staggered mimetic operators in combination to secondorder leapfrog time discretization. We apply these three schemes to model acoustic motion under homogeneous boundary conditions and compare their experimental convergence and execution times, as grid is successively refined. Both CFD schemes show fourorder convergence, with a slight superiority of the mimetic version, that leads to more accurate results on fine grids. Conversely, the mimetic leapfrog method only achieves quadratic convergence and shows similar accuracy to CFD results exclusively on coarse grids. We finally observe that computation times of nodal CFD simulations are between four and five times higher than those spent by the mimetic CFD scheme with similar grid size. This significant performance difference is attributed to solving those embedded linear systems inherent to implicit CFD.
20160418T18:23:37Z
Córdova, Luis
Rojas, Otilio
Otero Calviño, Beatriz
Castillo, Jose
We present two fourthorder compact finite difference (CFD) discretizations of the velocity–pressure formulation of the acoustic wave equation in 2D rectangular grids. The first method uses standard implicit CFD on nodal meshes and requires solving tridiagonal linear systems along each grid line, while the second scheme employs a novel set of mimetic CFD operators for explicit differentiation on staggered grids. Both schemes share a Crank–Nicolson time integration decoupled by the Peaceman–Rachford splitting technique to update discrete fields by alternating the coordinate direction of CFD differentiation (ADIlike iterations). For comparison purposes, we also implement a spatially fourthorder FD scheme using non compact staggered mimetic operators in combination to secondorder leapfrog time discretization. We apply these three schemes to model acoustic motion under homogeneous boundary conditions and compare their experimental convergence and execution times, as grid is successively refined. Both CFD schemes show fourorder convergence, with a slight superiority of the mimetic version, that leads to more accurate results on fine grids. Conversely, the mimetic leapfrog method only achieves quadratic convergence and shows similar accuracy to CFD results exclusively on coarse grids. We finally observe that computation times of nodal CFD simulations are between four and five times higher than those spent by the mimetic CFD scheme with similar grid size. This significant performance difference is attributed to solving those embedded linear systems inherent to implicit CFD.

Costconscious strategies to increase performance of numerical programs on agressive VLIW architectures
http://hdl.handle.net/2117/85498
Costconscious strategies to increase performance of numerical programs on agressive VLIW architectures
López Álvarez, David; Llosa Espuny, José Francisco; Valero Cortés, Mateo; Ayguadé Parra, Eduard
Loops are the main timeconsuming part of numerical applications. The performance of the loops is limited either by the resources offered by the architecture or by recurrences in the computation. To execute more operations per cycle, current processors are designed with growing degrees of resource replication (replication technique) for memory ports and functional units. However, the high cost in terms of area and cycle time of this technique precludes the use of high degrees of replication. High values for the cycle time may clearly offset any gain in terms of number of execution cycles. High values for the area may lead to an unimplementable configuration. An alternative to resource replication is resource widening (widening technique), which has also been used in some recent designs in which the width of the resources is increased (i.e., a single operation is performed over multiple data). Moreover, several generalpurpose superscalar microprocessors have been implemented with multiplyadd fused floatingpoint units (fusion technique), which reduces the latency of the combined operation and the number of resources used. The authors evaluate a broad set of VLIW processor design alternatives that combine the three techniques. We perform a technological projection for the next processor generations in order to foresee the possible implementable alternatives. From this study, we conclude that if the cost is taken into account, combining certain degrees of replication and widening in the hardware resources is more effective than applying only replication. Also, we confirm that multiplyadd fused units will have a significant impact in raising the performance of future processor architectures with a reasonable increase in cost
20160411T13:28:21Z
López Álvarez, David
Llosa Espuny, José Francisco
Valero Cortés, Mateo
Ayguadé Parra, Eduard
Loops are the main timeconsuming part of numerical applications. The performance of the loops is limited either by the resources offered by the architecture or by recurrences in the computation. To execute more operations per cycle, current processors are designed with growing degrees of resource replication (replication technique) for memory ports and functional units. However, the high cost in terms of area and cycle time of this technique precludes the use of high degrees of replication. High values for the cycle time may clearly offset any gain in terms of number of execution cycles. High values for the area may lead to an unimplementable configuration. An alternative to resource replication is resource widening (widening technique), which has also been used in some recent designs in which the width of the resources is increased (i.e., a single operation is performed over multiple data). Moreover, several generalpurpose superscalar microprocessors have been implemented with multiplyadd fused floatingpoint units (fusion technique), which reduces the latency of the combined operation and the number of resources used. The authors evaluate a broad set of VLIW processor design alternatives that combine the three techniques. We perform a technological projection for the next processor generations in order to foresee the possible implementable alternatives. From this study, we conclude that if the cost is taken into account, combining certain degrees of replication and widening in the hardware resources is more effective than applying only replication. Also, we confirm that multiplyadd fused units will have a significant impact in raising the performance of future processor architectures with a reasonable increase in cost

Thread assignment of multithreaded network applications in multicore/multithreaded processors
http://hdl.handle.net/2117/85462
Thread assignment of multithreaded network applications in multicore/multithreaded processors
Radojkovic, Petar; Cakarevic, Vladimir; Verdú Mulà, Javier; Pajuelo González, Manuel Alejandro; Cazorla Almeida, Francisco Javier; Nemirovsky, Mario; Valero Cortés, Mateo
The introduction of multithreaded processors comprised of a large number of cores with many shared resources makes thread scheduling, and in particular optimal assignment of running threads to processor hardware contexts to become one of the most promising ways to improve the system performance. However, finding optimal thread assignments for workloads running in stateoftheart multicore/multithreaded processors is an NPcomplete problem. In this paper, we propose BlackBox scheduler, a systematic method for thread assignment of multithreaded network applications running on multicore/multithreaded processors. The method requires minimum information about the target processor architecture and no data about the hardware requirements of the applications under study. The proposed method is evaluated with an industrial case study for a set of multithreaded network applications running on the UltraSPARC T2 processor. In most of the experiments, the proposed thread assignment method detected the best actual thread assignment in the evaluation sample. The method improved the system performance from 5 to 48 percent with respect to load balancing algorithms used in stateoftheart OSs, and up to 60 percent with respect to a naive thread assignment.
© 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
20160411T10:21:58Z
Radojkovic, Petar
Cakarevic, Vladimir
Verdú Mulà, Javier
Pajuelo González, Manuel Alejandro
Cazorla Almeida, Francisco Javier
Nemirovsky, Mario
Valero Cortés, Mateo
The introduction of multithreaded processors comprised of a large number of cores with many shared resources makes thread scheduling, and in particular optimal assignment of running threads to processor hardware contexts to become one of the most promising ways to improve the system performance. However, finding optimal thread assignments for workloads running in stateoftheart multicore/multithreaded processors is an NPcomplete problem. In this paper, we propose BlackBox scheduler, a systematic method for thread assignment of multithreaded network applications running on multicore/multithreaded processors. The method requires minimum information about the target processor architecture and no data about the hardware requirements of the applications under study. The proposed method is evaluated with an industrial case study for a set of multithreaded network applications running on the UltraSPARC T2 processor. In most of the experiments, the proposed thread assignment method detected the best actual thread assignment in the evaluation sample. The method improved the system performance from 5 to 48 percent with respect to load balancing algorithms used in stateoftheart OSs, and up to 60 percent with respect to a naive thread assignment.

Revisiting distancebased record linkage for privacypreserving release of statistical datasets
http://hdl.handle.net/2117/85339
Revisiting distancebased record linkage for privacypreserving release of statistical datasets
Herranz Sotoca, Javier; Nin Guerrero, Jordi; Rodríguez, Pablo; Tassa, Tamir
Statistical Disclosure Control (SDC, for short) studies the problem of privacypreserving data publishing in cases where the data is expected to be used for statistical analysis. An original dataset T containing sensitive information is transformed into a sanitized version T' which is released to the public. Both utility and privacy aspects are very important in this setting. For utility, T' must allow data miners or statisticians to obtain similar results to those which would have been obtained from the original dataset T. For privacy, T' must significantly reduce the ability of an adversary to infer sensitive information on the data subjects in T. One of the main aposteriori measures that the SDC community has considered up to now when analyzing the privacy offered by a given protection method is the DistanceBased Record Linkage (DBRL) risk measure. In this work, we argue that the classical DBRL risk measure is insufficient. For this reason, we introduce the novel Global DistanceBased Record Linkage (GDBRL) risk measure. We claim that this new measure must be evaluated alongside the classical DBRL measure in order to better assess the risk in publishing T' instead of T. After that, we describe how this new measure can be computed by the data owner and discuss the scalability of those computations. We conclude by extensive experimentation where we compare the risk assessments offered by our novel measure as well as by the classical one, using wellknown SDC protection methods. Those experiments validate our hypothesis that the GDBRL risk measure issues, in many cases, higher risk assessments than the classical DBRL measure. In other words, relying solely on the classical DBRL measure for risk assessment might be misleading, as the true risk may be in fact higher. Hence, we strongly recommend that the SDC community considers the new GDBRL risk measure as an additional measure when analyzing the privacy offered by SDC protection algorithms.
20160407T10:24:46Z
Herranz Sotoca, Javier
Nin Guerrero, Jordi
Rodríguez, Pablo
Tassa, Tamir
Statistical Disclosure Control (SDC, for short) studies the problem of privacypreserving data publishing in cases where the data is expected to be used for statistical analysis. An original dataset T containing sensitive information is transformed into a sanitized version T' which is released to the public. Both utility and privacy aspects are very important in this setting. For utility, T' must allow data miners or statisticians to obtain similar results to those which would have been obtained from the original dataset T. For privacy, T' must significantly reduce the ability of an adversary to infer sensitive information on the data subjects in T. One of the main aposteriori measures that the SDC community has considered up to now when analyzing the privacy offered by a given protection method is the DistanceBased Record Linkage (DBRL) risk measure. In this work, we argue that the classical DBRL risk measure is insufficient. For this reason, we introduce the novel Global DistanceBased Record Linkage (GDBRL) risk measure. We claim that this new measure must be evaluated alongside the classical DBRL measure in order to better assess the risk in publishing T' instead of T. After that, we describe how this new measure can be computed by the data owner and discuss the scalability of those computations. We conclude by extensive experimentation where we compare the risk assessments offered by our novel measure as well as by the classical one, using wellknown SDC protection methods. Those experiments validate our hypothesis that the GDBRL risk measure issues, in many cases, higher risk assessments than the classical DBRL measure. In other words, relying solely on the classical DBRL measure for risk assessment might be misleading, as the true risk may be in fact higher. Hence, we strongly recommend that the SDC community considers the new GDBRL risk measure as an additional measure when analyzing the privacy offered by SDC protection algorithms.

ParaDIME: Parallel distributed infrastructure for minimization of energy for data centers
http://hdl.handle.net/2117/85255
ParaDIME: Parallel distributed infrastructure for minimization of energy for data centers
Rethinagiri, Santhosh Kumar; Palomar Pérez, Óscar; Sobe, Anita; Yalcin, Gulay; Knauth, Thomas; Titos Gil, Rubén; Prieto, Pablo; Schneegaß, Malte; Cristal Kestelman, Adrián; Unsal, Osman Sabri; Felber, Pascal; Fetzer, Christof; Milojevic, Dragomir
Dramatic environmental and economic impact of the ever increasing power and energy consumption of modern computing devices in data centers is now a critical challenge. On the one hand, designers use technology scaling as one of the methods to face the phenomenon called dark silicon (only segments of a chip function concurrently due to power restrictions). On the other hand, designers use extremescale systems such as teradevices to meet the performance needs of their applications which in turn increases the power consumption of the platform. In order to overcome these challenges, we need novel computing paradigms that address energy efficiency. One of the promising solutions is to incorporate parallel distributed methodologies at different abstraction levels.; The FP7 project ParaDIME focuses on this objective to provide different distributed methodologies (softwarehardware techniques) at different abstraction levels to attack the powerwall problem. In particular, the ParaDIME framework will utilize: circuit and architecture operation below safe voltage limits for drastic energy savings, specialized energyaware computing accelerators, heterogeneous computing, energyaware runtime, approximate computing and poweraware message passing. The major outcome of the project will be a noval processor architecture for a heterogeneous distributed system that utilizes future device characteristics, runtime and programming model for drastic energy savings of data centers. Wherever possible, ParaDIME will adopt multidisciplinary techniques, such as hardware support for message passing, runtime energy optimization utilizing new hardware energy performance counters, use of accelerators for error recovery from subsafe voltage operation, and approximate computing through annotated code. Furthermore, we will establish and investigate the theoretical limits of energy savings at the device, circuit, architecture, runtime and programming model levels of the computing stack, as well as quantify the actual energy savings achieved by the ParaDIME approach for the complete computing stack with the real environment.
NOTICE: this is the author’s version of a work that was accepted for publication in "Microprocessors and Microsystems". Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in: Microprocessors and Microsystems, Volume 39, Issue 8 (November 2015). doi:10.1016/j.micpro.2015.06.005
20160406T08:07:46Z
Rethinagiri, Santhosh Kumar
Palomar Pérez, Óscar
Sobe, Anita
Yalcin, Gulay
Knauth, Thomas
Titos Gil, Rubén
Prieto, Pablo
Schneegaß, Malte
Cristal Kestelman, Adrián
Unsal, Osman Sabri
Felber, Pascal
Fetzer, Christof
Milojevic, Dragomir
Dramatic environmental and economic impact of the ever increasing power and energy consumption of modern computing devices in data centers is now a critical challenge. On the one hand, designers use technology scaling as one of the methods to face the phenomenon called dark silicon (only segments of a chip function concurrently due to power restrictions). On the other hand, designers use extremescale systems such as teradevices to meet the performance needs of their applications which in turn increases the power consumption of the platform. In order to overcome these challenges, we need novel computing paradigms that address energy efficiency. One of the promising solutions is to incorporate parallel distributed methodologies at different abstraction levels.; The FP7 project ParaDIME focuses on this objective to provide different distributed methodologies (softwarehardware techniques) at different abstraction levels to attack the powerwall problem. In particular, the ParaDIME framework will utilize: circuit and architecture operation below safe voltage limits for drastic energy savings, specialized energyaware computing accelerators, heterogeneous computing, energyaware runtime, approximate computing and poweraware message passing. The major outcome of the project will be a noval processor architecture for a heterogeneous distributed system that utilizes future device characteristics, runtime and programming model for drastic energy savings of data centers. Wherever possible, ParaDIME will adopt multidisciplinary techniques, such as hardware support for message passing, runtime energy optimization utilizing new hardware energy performance counters, use of accelerators for error recovery from subsafe voltage operation, and approximate computing through annotated code. Furthermore, we will establish and investigate the theoretical limits of energy savings at the device, circuit, architecture, runtime and programming model levels of the computing stack, as well as quantify the actual energy savings achieved by the ParaDIME approach for the complete computing stack with the real environment.

Thread assignment in multicore/multithreaded processors: A statistical approach
http://hdl.handle.net/2117/85248
Thread assignment in multicore/multithreaded processors: A statistical approach
Radojkovic, Petar; Carpenter, Paul M.; Moreto Planas, Miquel; Cakarevic, Vladimir; Verdú Mulà, Javier; Pajuelo González, Manuel Alejandro; Cazorla, Francisco J.; Nemirovsky, Mario; Valero Cortés, Mateo
The introduction of multicore/multithreaded processors, comprised of a large number of hardware contexts (virtual CPUs) that share resources at multiple levels, has made process scheduling, in particular assignment of running threads to available hardware contexts, an important aspect of system performance. Nevertheless, thread assignment of applications running on stateofthe art processors is an NPcomplete problem. Over the years, numerous studies have proposed heuristicbased algorithms for thread assignment. Since the thread assignment problem is intractable, it is in general impossible to know the performance of the optimal assignment, so the room for improvement of a given algorithm is also unknown. It is therefore hard to decide whether to invest more effort and time to improve an algorithm that may already be close to optimal. In this paper, we present a statistical approach to the thread assignment problem. First, we present a method that predicts the performance of the optimal thread assignment, based on the observed performance of each thread assignment in a random sample. The method is based on Extreme Value Theory (EVT), a branch of statistics that analyses extreme deviations from the population mean. We also propose sample pruning, a method that significantly reduces the time required to apply the statistical method by reducing the number of candidate solutions that need to be measured. Finally, we show that, if no suitable heuristicbased algorithm is available, a sample of several thousand random thread assignments is enough to obtain, with high confidence, an assignment with performance close to optimal. The presented approach is architecture and application independent, and it can be used to address the thread assignment problem in various domains. It is especially well suited for systems in which the workload seldom changes. An example is network systems, which typically provide a constant set of services that are known in advance, with network applications performing a similar processing algorithm for each packet in the system. In this paper, we validate our methods with an industrial case study for a set of multithreaded network applications on an UltraSPARC T2 processor. This article is an extension of our previous work [ 44], which was published in Proceedings of 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS2012).
© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
20160406T07:26:21Z
Radojkovic, Petar
Carpenter, Paul M.
Moreto Planas, Miquel
Cakarevic, Vladimir
Verdú Mulà, Javier
Pajuelo González, Manuel Alejandro
Cazorla, Francisco J.
Nemirovsky, Mario
Valero Cortés, Mateo
The introduction of multicore/multithreaded processors, comprised of a large number of hardware contexts (virtual CPUs) that share resources at multiple levels, has made process scheduling, in particular assignment of running threads to available hardware contexts, an important aspect of system performance. Nevertheless, thread assignment of applications running on stateofthe art processors is an NPcomplete problem. Over the years, numerous studies have proposed heuristicbased algorithms for thread assignment. Since the thread assignment problem is intractable, it is in general impossible to know the performance of the optimal assignment, so the room for improvement of a given algorithm is also unknown. It is therefore hard to decide whether to invest more effort and time to improve an algorithm that may already be close to optimal. In this paper, we present a statistical approach to the thread assignment problem. First, we present a method that predicts the performance of the optimal thread assignment, based on the observed performance of each thread assignment in a random sample. The method is based on Extreme Value Theory (EVT), a branch of statistics that analyses extreme deviations from the population mean. We also propose sample pruning, a method that significantly reduces the time required to apply the statistical method by reducing the number of candidate solutions that need to be measured. Finally, we show that, if no suitable heuristicbased algorithm is available, a sample of several thousand random thread assignments is enough to obtain, with high confidence, an assignment with performance close to optimal. The presented approach is architecture and application independent, and it can be used to address the thread assignment problem in various domains. It is especially well suited for systems in which the workload seldom changes. An example is network systems, which typically provide a constant set of services that are known in advance, with network applications performing a similar processing algorithm for each packet in the system. In this paper, we validate our methods with an industrial case study for a set of multithreaded network applications on an UltraSPARC T2 processor. This article is an extension of our previous work [ 44], which was published in Proceedings of 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS2012).

GeoSRS: a hybrid social recommender system for geolocated data
http://hdl.handle.net/2117/84070
GeoSRS: a hybrid social recommender system for geolocated data
Capdevila Pujol, Joan; Arias Vicente, Marta; Arratia Quesada, Argimiro Alejandro
All right sreserved. We present GeoSRS, a hybrid recommender system for a popular locationbased social network (LBSN), in which users are able to write short reviews on the places of interest they visit. Using stateoftheart text mining techniques, our system recommends locations to users using as source the whole set of text reviews in addition to their geographical location. To evaluate our system, we have collected our own data sets by crawling the social network Foursquare. To do this efficiently, we propose the use of a parallel version of the Quadtree technique, which may be applicable to crawling/exploring other spatially distributed sources. Finally, we study the performance of GeoSRS on our collected data set and conclude that by combining sentiment analysis and text modeling, GeoSRS generates more accurate recommendations. The performance of the system improves as more reviews are available, which further motivates the use of largescale crawling techniques such as the Quadtree.
20160309T14:36:56Z
Capdevila Pujol, Joan
Arias Vicente, Marta
Arratia Quesada, Argimiro Alejandro
All right sreserved. We present GeoSRS, a hybrid recommender system for a popular locationbased social network (LBSN), in which users are able to write short reviews on the places of interest they visit. Using stateoftheart text mining techniques, our system recommends locations to users using as source the whole set of text reviews in addition to their geographical location. To evaluate our system, we have collected our own data sets by crawling the social network Foursquare. To do this efficiently, we propose the use of a parallel version of the Quadtree technique, which may be applicable to crawling/exploring other spatially distributed sources. Finally, we study the performance of GeoSRS on our collected data set and conclude that by combining sentiment analysis and text modeling, GeoSRS generates more accurate recommendations. The performance of the system improves as more reviews are available, which further motivates the use of largescale crawling techniques such as the Quadtree.

A new mimetic scheme for the acoustic wave equation
http://hdl.handle.net/2117/83776
A new mimetic scheme for the acoustic wave equation
Solano, Freysimar; GuevaraJordan, Juan; Rojas, Otilio; Otero Calviño, Beatriz; Rodriguez, R.
A new mimetic finite difference scheme for solving the acoustic wave equation is presented. It combines a novel second order tensor mimetic discretizations in space and a leapfrog approximation in time to produce an explicit multidimensional scheme. Convergence analysis of the new scheme on a staggered grid shows that it can take larger time steps than standard finite difference schemes based on ghost points formulation. A set of numerical test problems gives evidence of the versatility of the new mimetic scheme for handling general boundary conditions.
20160303T15:40:32Z
Solano, Freysimar
GuevaraJordan, Juan
Rojas, Otilio
Otero Calviño, Beatriz
Rodriguez, R.
A new mimetic finite difference scheme for solving the acoustic wave equation is presented. It combines a novel second order tensor mimetic discretizations in space and a leapfrog approximation in time to produce an explicit multidimensional scheme. Convergence analysis of the new scheme on a staggered grid shows that it can take larger time steps than standard finite difference schemes based on ghost points formulation. A set of numerical test problems gives evidence of the versatility of the new mimetic scheme for handling general boundary conditions.

Runtimeaware architectures: a first approach
http://hdl.handle.net/2117/83615
Runtimeaware architectures: a first approach
Valero Cortés, Mateo; Moreto Planas, Miquel; Casas Guix, Marc; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José
In the last few years, the traditional ways to keep the increase of hardware performance at the rate predicted by Moore's Law have vanished. When unicores were the norm, hardware design was decoupled from the software stack thanks to a well defined Instruction Set Architecture (ISA). This simple interface allowed developing applications without worrying too much about the underlying hardware, while hardware designers were able to aggressively exploit instructionlevel parallelism (ILP) in superscalar processors. With the irruption of multicores and parallel applications, this simple interface started to leak. As a consequence, the role of decoupling again applications from the hardware was moved to the runtime system. Efficiently using the underlying hardware from this runtime without exposing its complexities to the application has been the target of very active and prolific research in the last years.
Current multicores are designed as simple symmetric multiprocessors (SMP) on a chip. However, we believe that this is not enough to overcome all the problems that multicores already have to face. It is our position that the runtime has to drive the design of future multicores to overcome the restrictions in terms of power, memory, programmability and resilience that multicores have. In this paper, we introduce a first approach towards a RuntimeAware Architecture (RAA), a massively parallel architecture designed from the runtime's perspective.
20160301T12:41:11Z
Valero Cortés, Mateo
Moreto Planas, Miquel
Casas Guix, Marc
Ayguadé Parra, Eduard
Labarta Mancho, Jesús José
In the last few years, the traditional ways to keep the increase of hardware performance at the rate predicted by Moore's Law have vanished. When unicores were the norm, hardware design was decoupled from the software stack thanks to a well defined Instruction Set Architecture (ISA). This simple interface allowed developing applications without worrying too much about the underlying hardware, while hardware designers were able to aggressively exploit instructionlevel parallelism (ILP) in superscalar processors. With the irruption of multicores and parallel applications, this simple interface started to leak. As a consequence, the role of decoupling again applications from the hardware was moved to the runtime system. Efficiently using the underlying hardware from this runtime without exposing its complexities to the application has been the target of very active and prolific research in the last years.
Current multicores are designed as simple symmetric multiprocessors (SMP) on a chip. However, we believe that this is not enough to overcome all the problems that multicores already have to face. It is our position that the runtime has to drive the design of future multicores to overcome the restrictions in terms of power, memory, programmability and resilience that multicores have. In this paper, we introduce a first approach towards a RuntimeAware Architecture (RAA), a massively parallel architecture designed from the runtime's perspective.