Now showing items 1-20 of 20

  • Achieving high memory performance from heterogeneous architectures with the SARC programming model 

    Ferrer, Roger; Beltran Querol, Vicenç; González Tallada, Marc; Martorell Bofill, Xavier; Ayguadé Parra, Eduard (ACM, 2009)
    Conference lecture
    Restricted access - publisher's policy
    Current heterogeneous multicore architectures, including the Cell/B.E., GPUs, and future developments, like Larrabee, require enormous programming efforts to efficiently run current parallel applications, achieving high ...
  • A novel asynchronous software cache implementation for the Cell-BE processor 

    Balart, J; González Tallada, Marc; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Sura, Z; Chen, T; Zhang, T; O'Brien, Kevin; O'Brien, Kathryn (2008-10)
    Article
    Restricted access - publisher's policy
    This paper describes the implementation of a runtime library for asynchronous communication in the Cell BE processor. The runtime library implementation provides with several services that allow the compiler to generate ...
  • A proposal for error handling in OpenMP 

    Duran González, Alejandro; Ferrer, Roger; Costa Prats, Juan José; González Tallada, Marc; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (2006-06)
    Article
    Restricted access - publisher's policy
    OpenMP has been focused in performance applied to numerical applications, but when we try to move this focus to other kind of applications, like Web servers, we detect one important lack. In these applications, performance ...
  • Automatic multilevel parallelization using OpenMP 

    Jin, H; Jost, G; Yan, J; Ayguadé Parra, Eduard; González Tallada, Marc; Martorell Bofill, Xavier (2004-06)
    Article
    Restricted access - publisher's policy
    In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler ...
  • Automatic pre-fetch and modulo scheduling transformations for the cell BE architecture 

    Vujic, N; González Tallada, Marc; Martorell Bofill, Xavier; Ayguadé Parra, Eduard (2008-01)
    Article
    Restricted access - publisher's policy
    Ease of programming is one of the main impediments for the broad acceptance of multi-core systems with no hardware support for transparent data transfer between local and global memories. Software cache is a robust approach ...
  • Coarse grain parallelization of deep neural networks 

    González Tallada, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Conference lecture
    Restricted access - publisher's policy
    Deep neural networks (DNN) have recently achieved extraordinary results in domains like computer vision and speech recognition. An essential element for this success has been the introduction of high performance computing ...
  • Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures 

    Álvarez Martí, Lluc; Vilanova, Lluís; Moreto Planas, Miquel; Casas, Marc; González Tallada, Marc; Martorell Bofill, Xavier; Navarro, Nacho; Ayguadé Parra, Eduard; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2015)
    Conference report
    Open Access
    The increasing number of cores in manycore architectures causes important power and scalability problems in the memory subsystem. One solution is to introduce scratchpad memories alongside the cache hierarchy, forming a ...
  • Dual-level parallelism exploitation with OpenMP in coastal ocean circulation modeling 

    González Tallada, Marc; Ayguadé Parra, Eduard; Martorell Bofill, Xavier; Labarta Mancho, Jesús José; Luong, P V (2002-05)
    Article
    Restricted access - publisher's policy
    Two alternative dual-level parallel implementations of the Multiblock Grid Princeton Ocean Model (MGPOM) are compared in this paper. The first one combines the use of two programming paradigms: message passing with the ...
  • Employing nested OpenMP for the parallelization of multi-zone computational fluid dynamics applications 

    Ayguadé Parra, Eduard; González Tallada, Marc; Martorell Bofill, Xavier; Jost, G (2006-05)
    Article
    Restricted access - publisher's policy
    In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study, we use the NanosCompiler that supports nesting of OpenMP ...
  • Evaluation of memory performance on the cell BE with the SARC programming model 

    Ferrer, Roger; González Tallada, Marc; Silla, Federico; Martorell Bofill, Xavier; Ayguadé Parra, Eduard (Association for Computing Machinery (ACM), 2008)
    Conference lecture
    Restricted access - publisher's policy
    With the advent of multicore architectures, especially with the heterogeneous ones, both computational and memory top performance are difficult to obtain using traditional programming models. Usually, programmers have to ...
  • Experiences parallelizing a web server with OpenMP 

    Balart Tarzan, Jairo; Duran González, Alejandro; González Tallada, Marc; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (2006-06)
    Article
    Restricted access - publisher's policy
    Multi-threaded web servers are typically parallelized by hand using the pthreads library. OpenMP has rarely been used to parallelize such kind of applications, although we foresee that it can be a great tool for network ...
  • Extending OpenMP to survive the heterogeneous multi-core era 

    Quintana-Ortí, Enrique S.; Planas, Judit; Pérez Cáncer, Josep Maria; Mayo, Rafael; Martorell Bofill, Xavier; Martinell, Lluis; Labarta Mancho, Jesús José; Jiménez González, Daniel; Ayguadé Parra, Eduard; Badia Sala, Rosa Maria; Bellens, Pieter; Cabrera, Daniel; Duran González, Alejandro; Ferrer, Roger; González Tallada, Marc; Igual, Francisco D. (2010-10)
    Article
    Restricted access - publisher's policy
  • Hardware-software coherence protocol for the coexistence of caches and local memories 

    Álvarez Martí, Lluc; Vilanova, Lluís; González Tallada, Marc; Martorell Bofill, Xavier; Navarro, Nacho; Ayguadé Parra, Eduard (2015-01-01)
    Article
    Open Access
    Cache coherence protocols limit the scalability of multicore and manycore architectures and are responsible for an important amount of the power consumed in the chip. A good way to alleviate these problems is to introduce ...
  • Hybrid access-specific software cache techniques for the cell BE architecture 

    O’Brien, Kathryn; O'Brien, Kevin; González Tallada, Marc; Vujic, Nikola; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Eichenberger, Alexandre E.; Chen, Tong; Sura, Zehra; Zhang, Tao (Association for Computing Machinery, 2008)
    Conference lecture
    Restricted access - publisher's policy
    Ease of programming is one of the main impediments for the broad acceptance of multi-core systems with no hardware support for transparent data transfer between local and global memories. Software cache is a robust approach ...
  • Nanos compiler: supporting flexible multilevel parallelism exploitation in OpenMP 

    González Tallada, Marc; Ayguadé Parra, Eduard; Martorell Bofill, Xavier; Labarta Mancho, Jesús José; Navarro, Nacho; Oliver, J. (2000-10)
    Article
    Restricted access - publisher's policy
    This paper describes the support provided by the NanosCompiler to nested parallelism in OpenMP. The NanosCompiler is a source-to-source parallelizing compiler implemented around a hierarchical internal program representation ...
  • Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL 

    Ferrer, Roger; Planas Carbonell, Judit; Bellens, Pieter; Duran Gonzalez, Alejandro; González Tallada, Marc; Martorell Bofill, Xavier; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (Springer, 2010)
    Conference report
    Restricted access - publisher's policy
    In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell/B.E. and ...
  • Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL 

    Ferrer, Roger; Planas Carbonell, Judit; Bellens, Pieter; Duran González, Alejandro; González Tallada, Marc; Martorell Bofill, Xavier; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (2011)
    Article
    Restricted access - publisher's policy
    In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell/B.E. and ...
  • Runtime address space computation for SDSM systems 

    Balart, J; González Tallada, Marc; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (2006-11)
    Article
    Restricted access - publisher's policy
    This paper explores the benefits and limitations of using a inspector/executor approach for Software Distributed Shared Memory (SDSM) systems. The role of the inspector is to obtain a description of the address space ...
  • Speeding up distributed MapReduce applications using hardware accelerators 

    Becerra Fontal, Yolanda; Beltran Querol, Vicenç; Carrera Pérez, David; González Tallada, Marc; Torres Viñals, Jordi; Ayguadé Parra, Eduard (2009-09)
    Conference report
    Open Access
    In an attempt to increase the performance/cost ratio, large compute clusters are becoming heterogeneous at multiple levels: from asymmetric processors, to different system architectures, operating systems and networks. ...
  • Systematic energy characterization of CMP/SMT processor systems via automated micro-benchmarks 

    Bertrán, Ramon; Buyuktosunoglu, Alper; Gupta, Meeta S.; González Tallada, Marc; Bose, Pradip (2012)
    Conference report
    Open Access
    Microprocessor-based systems today are composed of multi-core, multi-threaded processors with complex cache hierarchies and gigabytes of main memory. Accurate characterization of such a system, through predictive pre-silicon ...