Now showing items 1-20 of 124

    • A flexible heterogeneous multi-core architecture 

      Pericàs Gleim, Miquel; Cristal Kestelman, Adrián; Cazorla, Francisco; González García, Rubén; Jiménez, Daniel A.; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2007)
      Conference report
      Open Access
      Multi-core processors naturally exploit thread-level parallelism (TLP). However, extracting instruction-level parallelism (ILP) from individual applications or threads is still a challenge as application mixes in this ...
    • A framework for integrating data alignment, distribution, and redistribution in distributed memory multiprocessors 

      García Almiñana, Jordi; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (2001-04)
      Article
      Restricted access - publisher's policy
      Parallel architectures with physically distributed memory provide a cost-effective scalability to solve many large scale scientific problems. However, these systems are very difficult to program and tune. In these systems, ...
    • A hardware runtime for task-based programming models 

      Tan, Xubin; Bosch, Jaume; Álvarez, Carlos; Jiménez González, Daniel; Ayguadé Parra, Eduard; Valero Cortés, Mateo (2019-09-01)
      Article
      Open Access
      Task-based programming models such as OpenMP 5.0 and OmpSs are simple to use and powerful enough to exploit task parallelism of applications over multicore, manycore and heterogeneous systems. However, their software-only ...
    • A highly scalable parallel implementation of H.264 

      Azevedo, Arnaldo; Juurlink, Ben; Meenderinck, Cor; Terechko, Andrei; Hoogerbrugge, Jan; Álvarez Mesa, Mauricio; Ramírez Bellido, Alejandro; Valero Cortés, Mateo (2011)
      Article
      Open Access
      Developing parallel applications that can harness and efficiently use future many-core architectures is the key challenge for scalable computing systems. We contribute to this challenge by presenting a parallel implementation ...
    • A low cost split-issue technique to improve performance of SMT clustered VLIW processors 

      Gupta, Manoj; Sánchez Carracedo, Fermín; Llosa Espuny, José Francisco (2010)
      Conference report
      Open Access
      Abstract—Very Long Instruction Word (VLIW) processors are a popular choice in embedded domain due to their hardware simplicity, low cost and low power consumption. Simultaneous MultiThreading (SMT) is a popular technique for ...
    • A module-based cell processor simulator 

      Cabarcas Jaramillo, Felipe; Rico Carro, Alejandro; Rodenas, David; Martorell Bofill, Xavier; Ramírez Bellido, Alejandro; Ayguadé Parra, Eduard (European Network of Excellence on High Performance and Embedded Architecture and Compilation (HiPEAC), 2006)
      Conference lecture
      Open Access
      An interesting design alternative to replication-based chip multiprocessors is to create heterogeneous chip multiprocessors composed of several different cores, with one or more of them running the operating system and ...
    • A multithreading RISC-V implementation for Lagarto Architecture 

      Mendoza Escobar, Jonnatan (Universitat Politècnica de Catalunya, 2020-04)
      Master thesis
      Open Access
      The development of computer architecture standards for many years was mainly delegated to a few groups of companies that define most of the popular Instructions Set Architectures (ISAs). While the Information Technologies ...
    • A Simulation framework for hierarchical Network-on-Chip systems 

      San Pedro Martín, Javier de (Universitat Politècnica de Catalunya, 2012-06-22)
      Master thesis
      Open Access
      Today, even the simplest laptop processor has at least four cores and a graphics card containing tens of cores. It is not hard to find more performance- oriented processors with hundreds of cores, and it is expected to ...
    • A study of the communication cost of the FFT on torus multicomputers 

      Díaz de Cerio Ripalda, Luis Manuel; Valero García, Miguel; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 1995)
      Conference report
      Open Access
      The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Different approaches are proposed which differ in the way they use the interconnection network. The first approach is based on ...
    • A systolic algorithm for the fast computation of the connected components of a graph 

      Núñez, Fernando J.; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 1988)
      Conference report
      Open Access
      The authors consider the description of a systolic algorithm to solve the connected-component problem. It is executed in a ring topology with N processors, requiring O(Nlog N) time without regard to the graph's sparsity. ...
    • A transparent runtime data distribution engine for OpenMP 

      Nikolopoulos, Dimitrios; Papatheodorou, Theodore; Polychronopoulos, C D; Labarta Mancho, Jesús José; Ayguadé Parra, Eduard (2001-07)
      Article
      Restricted access - publisher's policy
      This paper makes two important contributions. First, the paper investigates the performance implications of data placement in OpenMP programs running on modern NUMA multiprocessors. Data locality and minimization of the ...
    • A unified modulo scheduling and register allocation technique for clustered processors 

      Codina Viñas, Josep M.; Sánchez Navarro, F. Jesús; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2001)
      Conference report
      Open Access
      This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more ...
    • Acceleration of the Geostatistical Software Library (GSLIB) by code optimization and hybrid parallel programming 

      Peredo, Oscar; Ortiz, Julián; Herrero Zaragoza, José Ramón (2015-12-01)
      Article
      Open Access
      The Geostatistical Software Library (GSLIB) has been used in the geostatistical community for more than thirty years. It was designed as a bundle of sequential Fortran codes, and today it is still in use by many practitioners ...
    • Access to streams in multiprocessor systems 

      Valero Cortés, Mateo; Peirón Guardia, Montse; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 1993)
      Conference report
      Open Access
      When accessing streams in vector multiprocessor machines, degradation in the interconnection network and conflicts in the memory modules are the factors that reduce the efficiency of the system. In this paper, we present ...
    • Access to vectors in multi-module memories 

      Valero Cortés, Mateo; Peiron Guàrdia, Montse; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 1994)
      Conference report
      Open Access
      The poor bandwidth obtained from memory when conflicts arise in the modules or in the interconnection network degrades the performance of computers. Address transformation schemes, such as interleaving, skewing and linear ...
    • Adapting cache partitioning algorithms to pseudo-LRU replacement policies 

      Kedzierski, Kamil; Moreto Planas, Miquel; Cazorla, Francisco; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2010)
      Conference report
      Open Access
      Recent studies have shown that cache partitioning is an efficient technique to improve throughput, fairness and Quality of Service (QoS) in CMP processors. The cache partitioning algorithms proposed so far assume Least ...
    • AMA: asynchronous management of accelerators for task-based programming models 

      Planas, Judit; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (Elsevier, 2015)
      Conference report
      Open Access
      Computational science has benefited in the last years from emerging accelerators that increase the performance of scientific simulations, but using these devices hinders the programming task. This paper presents AMA: a set ...
    • An approximate analysis of synchronous multiple bus 

      González Peña, Luis Eduardo; Sanvicente Gargallo, Emilio (1985)
      External research report
      Open Access
      This paper presents an approximate analytic model for evaluating the performance of a loosely coupled multiprocessor architecture whose memory, organized in modules, is shared by all the processors. Each memory module (Mi) ...
    • Analysis and simulation of multiplexed single-bus networks with and without buffering 

      Llaberia Griñó, José M.; Valero Cortés, Mateo; Herrada Lillo, Enrique; Labarta Mancho, Jesús José (Institute of Electrical and Electronics Engineers (IEEE), 1985)
      Conference report
      Open Access
      Performance issues of a single-bus interconnection network for multiprocessor systems, operating in a multiplexed way, are presented in this paper. Several models are developed and used to allow system performance evaluation. ...
    • Anaphase: a fine-grain thread decomposition scheme for speculative multithreading 

      Madriles Gimeno, Carles; López Muñoz, Pedro; Codina Viñas, Josep M.; Gibert Codina, Enric; Latorre Salinas, Fernando; Martínez Vicente, Alejandro; Martinez, Raul; González Colás, Antonio María (IEEE Computer Society, 2009)
      Conference report
      Open Access
      Industry is moving towards multi-core designs as we have hit the memory and power walls. Multi-core designs are very effective to exploit thread-level parallelism (TLP) but do not provide benefits when executing serial ...