Now showing items 1-20 of 134

  • Acceleration of the Geostatistical Software Library (GSLIB) by code optimization and hybrid parallel programming 

    Peredo, Oscar; Ortiz, Julián; Herrero Zaragoza, José Ramón (2015-12-01)
    Article
    Open Access
    The Geostatistical Software Library (GSLIB) has been used in the geostatistical community for more than thirty years. It was designed as a bundle of sequential Fortran codes, and today it is still in use by many practitioners ...
  • Access to streams in multiprocessor systems 

    Valero Cortés, Mateo; Peirón Guardia, Montse; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 1993)
    Conference report
    Open Access
    When accessing streams in vector multiprocessor machines, degradation in the interconnection network and conflicts in the memory modules are the factors that reduce the efficiency of the system. In this paper, we present ...
  • Access to vectors in multi-module memories 

    Valero Cortés, Mateo; Peiron Guàrdia, Montse; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 1994)
    Conference report
    Open Access
    The poor bandwidth obtained from memory when conflicts arise in the modules or in the interconnection network degrades the performance of computers. Address transformation schemes, such as interleaving, skewing and linear ...
  • Adapting cache partitioning algorithms to pseudo-LRU replacement policies 

    Kedzierski, Kamil; Moreto Planas, Miquel; Cazorla, Francisco; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2010)
    Conference report
    Open Access
    Recent studies have shown that cache partitioning is an efficient technique to improve throughput, fairness and Quality of Service (QoS) in CMP processors. The cache partitioning algorithms proposed so far assume Least ...
  • A framework for integrating data alignment, distribution, and redistribution in distributed memory multiprocessors 

    García Almiñana, Jordi; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (2001-04)
    Article
    Restricted access - publisher's policy
    Parallel architectures with physically distributed memory provide a cost-effective scalability to solve many large scale scientific problems. However, these systems are very difficult to program and tune. In these systems, ...
  • A highly scalable parallel implementation of H.264 

    Azevedo, Arnaldo; Juurlink, Ben; Meenderinck, Cor; Terechko, Andrei; Hoogerbrugge, Jan; Álvarez Mesa, Mauricio; Ramírez Bellido, Alejandro; Valero Cortés, Mateo (2011)
    Article
    Open Access
    Developing parallel applications that can harness and efficiently use future many-core architectures is the key challenge for scalable computing systems. We contribute to this challenge by presenting a parallel implementation ...
  • A low cost split-issue technique to improve performance of SMT clustered VLIW processors 

    Gupta, Manoj; Sánchez Carracedo, Fermín; Llosa Espuny, José Francisco (2010)
    Conference report
    Open Access
    Abstract—Very Long Instruction Word (VLIW) processors are a popular choice in embedded domain due to their hardware simplicity, low cost and low power consumption. Simultaneous MultiThreading (SMT) is a popular technique for ...
  • AMA: asynchronous management of accelerators for task-based programming models 

    Planas, Judit; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (Elsevier, 2015)
    Conference report
    Open Access
    Computational science has benefited in the last years from emerging accelerators that increase the performance of scientific simulations, but using these devices hinders the programming task. This paper presents AMA: a set ...
  • A module-based cell processor simulator 

    Cabarcas Jaramillo, Felipe; Rico Carro, Alejandro; Rodenas, David; Martorell Bofill, Xavier; Ramírez Bellido, Alejandro; Ayguadé Parra, Eduard (European Network of Excellence on High Performance and Embedded Architecture and Compilation (HiPEAC), 2006)
    Conference lecture
    Open Access
    An interesting design alternative to replication-based chip multiprocessors is to create heterogeneous chip multiprocessors composed of several different cores, with one or more of them running the operating system and ...
  • Analysis and simulation of multiplexed single-bus networks with and without buffering 

    Llaberia Griñó, José M.; Valero Cortés, Mateo; Herrada Lillo, Enrique; Labarta Mancho, Jesús José (Institute of Electrical and Electronics Engineers (IEEE), 1985)
    Conference report
    Open Access
    Performance issues of a single-bus interconnection network for multiprocessor systems, operating in a multiplexed way, are presented in this paper. Several models are developed and used to allow system performance evaluation. ...
  • Anaphase: a fine-grain thread decomposition scheme for speculative multithreading 

    Madriles Gimeno, Carles; López Muñoz, Pedro; Codina Viñas, Josep M.; Gibert Codina, Enric; Latorre Salinas, Fernando; Martínez Vicente, Alejandro; Martinez, Raul; González Colás, Antonio María (IEEE Computer Society, 2009)
    Conference report
    Open Access
    Industry is moving towards multi-core designs as we have hit the memory and power walls. Multi-core designs are very effective to exploit thread-level parallelism (TLP) but do not provide benefits when executing serial ...
  • An approximate analysis of synchronous multiple bus 

    González Peña, Luis Eduardo; Sanvicente Gargallo, Emilio (1985)
    External research report
    Open Access
    This paper presents an approximate analytic model for evaluating the performance of a loosely coupled multiprocessor architecture whose memory, organized in modules, is shared by all the processors. Each memory module (Mi) ...
  • Animaciones interactivas para la enseñanza y aprendizaje de los protocolos de coherencia de cachés 

    Alcón Laguéns, Alberto; Barrachina Mir, Sergio; Quintana Ortí, Enrique S. (Universidad de Sevilla. Escuela Técnica Superior de Ingeniería Informática, 2011-07-05)
    Conference lecture
    Open Access
    Entre los objetivos formativos de los cursos avanzados de arquitectura de computadores suele estar el de que los estudiantes sean capaces de describir y analizar el funcionamiento de los protocolos de coherencia de ...
  • A Simulation framework for hierarchical Network-on-Chip systems 

    San Pedro Martín, Javier de (Universitat Politècnica de Catalunya, 2012-06-22)
    Master thesis
    Open Access
    Today, even the simplest laptop processor has at least four cores and a graphics card containing tens of cores. It is not hard to find more performance- oriented processors with hundreds of cores, and it is expected to ...
  • A study of the communication cost of the FFT on torus multicomputers 

    Díaz de Cerio Ripalda, Luis Manuel; Valero García, Miguel; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 1995)
    Conference report
    Open Access
    The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Different approaches are proposed which differ in the way they use the interconnection network. The first approach is based on ...
  • A systolic algorithm for the fast computation of the connected components of a graph 

    Núñez, Fernando J.; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 1988)
    Conference report
    Open Access
    The authors consider the description of a systolic algorithm to solve the connected-component problem. It is executed in a ring topology with N processors, requiring O(Nlog N) time without regard to the graph's sparsity. ...
  • A transparent runtime data distribution engine for OpenMP 

    Nikolopoulos, Dimitrios; Papatheodorou, Theodore; Polychronopoulos, C D; Labarta Mancho, Jesús José; Ayguadé Parra, Eduard (2001-07)
    Article
    Restricted access - publisher's policy
    This paper makes two important contributions. First, the paper investigates the performance implications of data placement in OpenMP programs running on modern NUMA multiprocessors. Data locality and minimization of the ...
  • A unified modulo scheduling and register allocation technique for clustered processors 

    Codina Viñas, Josep M.; Sánchez Navarro, F. Jesús; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2001)
    Conference report
    Open Access
    This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more ...
  • Automatic exploration of potential parallelism in sequential applications 

    Subotic, Vladimir; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Springer, 2014)
    Conference report
    Restricted access - publisher's policy
    The multicore era has increased the need for highly parallel software. Since automatic parallelization turned out ineffective for many production codes, the community hopes for the development of tools that may assist ...
  • Automatic pre-fetch and modulo scheduling transformations for the cell BE architecture 

    Vujic, N; González Tallada, Marc; Martorell Bofill, Xavier; Ayguadé Parra, Eduard (2008-01)
    Article
    Restricted access - publisher's policy
    Ease of programming is one of the main impediments for the broad acceptance of multi-core systems with no hardware support for transparent data transfer between local and global memories. Software cache is a robust approach ...