Ara es mostren els items 21-29 de 29

    • Multi-GPU parallelization of the NAS multi-zone parallel benchmarks 

      González Tallada, Marc; Morancho Llena, Enrique (2021-01-01)
      Article
      Accés obert
      GPU-based computing systems have become a widely accepted solution for the high-performance-computing (HPC) domain. GPUs have shown highly competitive performance-per-watt ratios and can exploit an astonishing level of ...
    • Multi-GPU systems and Unified Virtual Memory for scientific applications: The case of the NAS multi-zone parallel benchmarks 

      González Tallada, Marc; Morancho Llena, Enrique (Elsevier, 2021-12)
      Article
      Accés obert
      GPU-based computing systems have become a widely accepted solution for the high-performance-computing (HPC) domain. GPUs have shown highly competitive performance-per-watt ratios and can exploit an astonishing level of ...
    • NanosCompiler: supporting flexible multilevel parallelism exploitation in OpenMP 

      González Tallada, Marc; Ayguadé Parra, Eduard; Martorell Bofill, Xavier; Labarta Mancho, Jesús José; Navarro, Nacho; Oliver Segura, José (2000-10)
      Article
      Accés restringit per política de l'editorial
      This paper describes the support provided by the NanosCompiler to nested parallelism in OpenMP. The NanosCompiler is a source-to-source parallelizing compiler implemented around a hierarchical internal program representation ...
    • Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL 

      Ferrer, Roger; Planas Carbonell, Judit; Bellens, Pieter; Duran González, Alejandro; González Tallada, Marc; Martorell Bofill, Xavier; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (2011)
      Article
      Accés restringit per política de l'editorial
      In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell/B.E. and ...
    • Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL 

      Ferrer, Roger; Planas Carbonell, Judit; Bellens, Pieter; Duran Gonzalez, Alejandro; González Tallada, Marc; Martorell Bofill, Xavier; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (Springer, 2010)
      Text en actes de congrés
      Accés restringit per política de l'editorial
      In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell/B.E. and ...
    • Runtime address space computation for SDSM systems 

      Balart Tarzan, Jairo; González Tallada, Marc; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (2007)
      Article
      Accés obert
      This paper explores the benefits and limitations of using a inspector/executor approach for Software Distributed Shared Memory (SDSM) systems. The role of the inspector is to obtain a description of the address space ...
    • Speeding up distributed MapReduce applications using hardware accelerators 

      Becerra Fontal, Yolanda; Beltran Querol, Vicenç; Carrera Pérez, David; González Tallada, Marc; Torres Viñals, Jordi; Ayguadé Parra, Eduard (2009-09)
      Text en actes de congrés
      Accés obert
      In an attempt to increase the performance/cost ratio, large compute clusters are becoming heterogeneous at multiple levels: from asymmetric processors, to different system architectures, operating systems and networks. ...
    • Systematic energy characterization of CMP/SMT processor systems via automated micro-benchmarks 

      Bertrán, Ramon; Buyuktosunoglu, Alper; Gupta, Meeta S.; González Tallada, Marc; Bose, Pradip (2012)
      Text en actes de congrés
      Accés obert
      Microprocessor-based systems today are composed of multi-core, multi-threaded processors with complex cache hierarchies and gigabytes of main memory. Accurate characterization of such a system, through predictive pre-silicon ...
    • Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors 

      Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Navarro, Nacho; Corbalán González, Julita; González Tallada, Marc; Labarta Mancho, Jesús José (Association for Computing Machinery (ACM), 1999)
      Text en actes de congrés
      Accés obert
      This paper presents some techniques for efficient thread forking and joining in parallel execution environments, taking into consideration the physical structure of NUMA machines and the support for multi-level parallelization ...