Ara es mostren els items 18-29 de 29

    • Hardware-software coherence protocol for the coexistence of caches and local memories 

      Álvarez Martí, Lluc; Vilanova, Lluís; González Tallada, Marc; Martorell Bofill, Xavier; Navarro, Nacho; Ayguadé Parra, Eduard (2015-01-01)
      Article
      Accés obert
      Cache coherence protocols limit the scalability of multicore and manycore architectures and are responsible for an important amount of the power consumed in the chip. A good way to alleviate these problems is to introduce ...
    • Heterogeneous programming using OpenMP and CUDA/HIP for hybrid CPU-GPU scientific applications 

      González Tallada, Marc; Morancho Llena, Enrique (SAGE publishing, 2023-01-01)
      Article
      Accés obert
      Hybrid computer systems combine compute units (CUs) of different nature like CPUs, GPUs and FPGAs. Simultaneously exploiting the computing power of these CUs requires a careful decomposition of the applications into balanced ...
    • Hybrid access-specific software cache techniques for the cell BE architecture 

      O’Brien, Kathryn; O'Brien, Kevin; González Tallada, Marc; Vujic, Nikola; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Eichenberger, Alexandre E.; Chen, Tong; Sura, Zehra; Zhang, Tao (Association for Computing Machinery, 2008)
      Comunicació de congrés
      Accés restringit per política de l'editorial
      Ease of programming is one of the main impediments for the broad acceptance of multi-core systems with no hardware support for transparent data transfer between local and global memories. Software cache is a robust approach ...
    • Multi-GPU parallelization of the NAS multi-zone parallel benchmarks 

      González Tallada, Marc; Morancho Llena, Enrique (2021-01-01)
      Article
      Accés obert
      GPU-based computing systems have become a widely accepted solution for the high-performance-computing (HPC) domain. GPUs have shown highly competitive performance-per-watt ratios and can exploit an astonishing level of ...
    • Multi-GPU systems and Unified Virtual Memory for scientific applications: The case of the NAS multi-zone parallel benchmarks 

      González Tallada, Marc; Morancho Llena, Enrique (Elsevier, 2021-12)
      Article
      Accés obert
      GPU-based computing systems have become a widely accepted solution for the high-performance-computing (HPC) domain. GPUs have shown highly competitive performance-per-watt ratios and can exploit an astonishing level of ...
    • NanosCompiler: supporting flexible multilevel parallelism exploitation in OpenMP 

      González Tallada, Marc; Ayguadé Parra, Eduard; Martorell Bofill, Xavier; Labarta Mancho, Jesús José; Navarro, Nacho; Oliver Segura, José (2000-10)
      Article
      Accés restringit per política de l'editorial
      This paper describes the support provided by the NanosCompiler to nested parallelism in OpenMP. The NanosCompiler is a source-to-source parallelizing compiler implemented around a hierarchical internal program representation ...
    • Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL 

      Ferrer, Roger; Planas Carbonell, Judit; Bellens, Pieter; Duran González, Alejandro; González Tallada, Marc; Martorell Bofill, Xavier; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (2011)
      Article
      Accés restringit per política de l'editorial
      In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell/B.E. and ...
    • Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL 

      Ferrer, Roger; Planas Carbonell, Judit; Bellens, Pieter; Duran Gonzalez, Alejandro; González Tallada, Marc; Martorell Bofill, Xavier; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (Springer, 2010)
      Text en actes de congrés
      Accés restringit per política de l'editorial
      In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell/B.E. and ...
    • Runtime address space computation for SDSM systems 

      Balart Tarzan, Jairo; González Tallada, Marc; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (2007)
      Article
      Accés obert
      This paper explores the benefits and limitations of using a inspector/executor approach for Software Distributed Shared Memory (SDSM) systems. The role of the inspector is to obtain a description of the address space ...
    • Speeding up distributed MapReduce applications using hardware accelerators 

      Becerra Fontal, Yolanda; Beltran Querol, Vicenç; Carrera Pérez, David; González Tallada, Marc; Torres Viñals, Jordi; Ayguadé Parra, Eduard (2009-09)
      Text en actes de congrés
      Accés obert
      In an attempt to increase the performance/cost ratio, large compute clusters are becoming heterogeneous at multiple levels: from asymmetric processors, to different system architectures, operating systems and networks. ...
    • Systematic energy characterization of CMP/SMT processor systems via automated micro-benchmarks 

      Bertrán, Ramon; Buyuktosunoglu, Alper; Gupta, Meeta S.; González Tallada, Marc; Bose, Pradip (2012)
      Text en actes de congrés
      Accés obert
      Microprocessor-based systems today are composed of multi-core, multi-threaded processors with complex cache hierarchies and gigabytes of main memory. Accurate characterization of such a system, through predictive pre-silicon ...
    • Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors 

      Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Navarro, Nacho; Corbalán González, Julita; González Tallada, Marc; Labarta Mancho, Jesús José (Association for Computing Machinery (ACM), 1999)
      Text en actes de congrés
      Accés obert
      This paper presents some techniques for efficient thread forking and joining in parallel execution environments, taking into consideration the physical structure of NUMA machines and the support for multi-level parallelization ...