Now showing items 1-20 of 265

  • A Bicriteria Simulated Annealing Algorithm for Scheduling Jobs on Parallel Machines with Sequence Dependent Setup Times 

    Persson, Rasmus (Universitat Politècnica de Catalunya, 2008-09)
    Master thesis (pre-Bologna period)
    Open Access
    The study considers the scheduling problem of identical parallel machines subject to minimization of the maximum completion time and the maximum tardiness expressed in a linear convex objective function. The maximum ...
  • A case for merging the ILP and DLP paradigms 

    Quintana Rodríguez, Francisca; Espasa Sans, Roger; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 1998)
    Conference report
    Open Access
    The goal of this paper is to show that instruction level parallelism (ILP) and data-level parallelism (DLP) can be merged in a single architecture to execute vectorizable code at a performance level that can not be achieved ...
  • A case study of hybrid dataflow and shared-memory programming models: Dependency-based parallel game engine 

    Gajinov, Vladimir; Eric, Igor; Stojanovic, Saa; Milutinovic, Veljko; Unsal, Osman Sabri; Ayguadé Parra, Eduard; Cristal Kestelman, Adrián (Institute of Electrical and Electronics Engineers (IEEE), 2014)
    Conference report
    Restricted access - publisher's policy
    Recently proposed hybrid dataflow and shared memory programming models combine these two underlying models in order to support a wider range of problems naturally. The effectiveness of such hybrid models for parallel ...
  • Accelerating boosting-based face detection on GPUs 

    Oro, David; Fernández, Carles; Segura, Carlos; Martorell Bofill, Xavier; Hernando Pericás, Francisco Javier (2012)
    Conference report
    Restricted access - publisher's policy
    The goal of face detection is to determine the presence of faces in arbitrary images, along with their locations and dimensions. As it happens with any graphics workloads, these algorithms benefit from data-level ...
  • A complexity-effective simultaneous multithreading architecture 

    Acosta Ojeda, Carmelo Alexis; Falcón Samper, Ayose Jesús; Ramírez Bellido, Alejandro; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2005)
    Conference report
    Open Access
    Different applications may exhibit radically different behaviors and thus have very different requirements in terms of hardware support. In simultaneous multithreading (SMT) architectures, the hardware is shared among ...
  • A content aware integer register file organization 

    González García, Rubén; Cristal Kestelman, Adrián; Ortega Fernández, Daniel; Veidenbaum, Alex; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2004)
    Conference report
    Open Access
    A register file is a critical component of a modern superscalar processor. It has a large number of entries and read/write ports in order to enable high levels of instruction parallelism. As a result, the register file's ...
  • A cost-effective clustered architecture 

    Canal Corretger, Ramon; Parcerisa Bundó, Joan Manuel; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 1999)
    Conference report
    Open Access
    In current superscalar processors, all floating-point resources are idle during the execution of integer programs. As previous works show, this problem can be alleviated if the floating-point cluster is extended to execute ...
  • Adaptive, efficient, parallel execution of parallel programs 

    Sohi, Guri (Barcelona Supercomputing Center, 2016-09-10)
    Conference report
    Open Access
    Future parallel processors will be heterogeneous, be increasingly less reliable, and operate in dynamically changing operating conditions. This will result in a constantly varying pool of hardware resources which can greatly ...
  • A data flow language to develop high performance computing DSLs 

    Fernandez, Alejandro; Berltran, Vicenç; Mateo, Sergi; Patejko, Thomas; Ayguadé Parra, Eduard (Association for Computing Machinery (ACM), 2014)
    Conference report
    Restricted access - publisher's policy
    Developing complex scientific applications on high performance systems requires both domain knowledge and expertise in parallel and distributed programming models. In addition, modern high performance systems are heterogeneous, ...
  • A distributed processor state management architecture for large-window processors 

    González, Isidro; Galluzzi, Marco; Veidenbaum, Alexander V.; Ramírez, Marco Antonio; Cristal Kestelman, Adrián; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2008)
    Conference report
    Open Access
    Processor architectures with large instruction windows have been proposed to expose more instruction-level parallelism (ILP) and increase performance. Some of the proposed architectures replace a re-order buffer (ROB) with ...
  • A flexible heterogeneous multi-core architecture 

    Pericàs Gleim, Miquel; Cristal Kestelman, Adrián; Cazorla, Francisco; González García, Rubén; Jiménez, Daniel A.; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2007)
    Conference report
    Open Access
    Multi-core processors naturally exploit thread-level parallelism (TLP). However, extracting instruction-level parallelism (ILP) from individual applications or threads is still a challenge as application mixes in this ...
  • A hardware runtime for task-based programming models 

    Tan, Xubin; Bosch, Jaume; Álvarez, Carlos; Jiménez González, Daniel; Ayguadé Parra, Eduard; Valero Cortés, Mateo (2019-09-01)
    Article
    Open Access
    Task-based programming models such as OpenMP 5.0 and OmpSs are simple to use and powerful enough to exploit task parallelism of applications over multicore, manycore and heterogeneous systems. However, their software-only ...
  • A highly scalable parallel implementation of H.264 

    Azevedo, Arnaldo; Juurlink, Ben; Meenderinck, Cor; Terechko, Andrei; Hoogerbrugge, Jan; Álvarez Mesa, Mauricio; Ramírez Bellido, Alejandro; Valero Cortés, Mateo (2011)
    Article
    Open Access
    Developing parallel applications that can harness and efficiently use future many-core architectures is the key challenge for scalable computing systems. We contribute to this challenge by presenting a parallel implementation ...
  • Algoritmo paralelo para la eliminación de superficies ocultas 

    Roselló Balanyà, Celestí; Peña Marí, Ricardo (Marcombo, 1987)
    Conference report
    Open Access
    Se presenta una notación algorítmica para programación paralela en la que la recursividad y la introducción gradual del paralelismo juegan un papel importante. Se aplica dicha notación a la presentación de una versión ...
  • ALOJA: a systematic study of Hadoop deployment variables to enable automated characterization of cost-effectiveness 

    Poggi Mastrokalo, Nicolas; Carrera Pérez, David; Call, Aaron; Mendoza, Sergio; Becerra Fontal, Yolanda; Torres Viñals, Jordi; Ayguadé Parra, Eduard; Gagliardi, Fabrizio; Labarta Mancho, Jesús José; Reinauer, Rob; Vujic, Nikola; Green, Daron; Blakeley, Jose (Institute of Electrical and Electronics Engineers (IEEE), 2014)
    Conference report
    Open Access
    This article presents the ALOJA project, an initiative to produce mechanisms for an automated characterization of cost-effectiveness of Hadoop deployments and reports its initial results. ALOJA is the latest phase of a ...
  • A low-complexity, high-performance fetch unit for simultaneous multithreading processors 

    Falcón Samper, Ayose Jesús; Ramírez Bellido, Alejandro; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2004)
    Conference report
    Open Access
    Simultaneous multithreading (SMT) is an architectural technique that allows for the parallel execution of several threads simultaneously. Fetch performance has been identified as the most important bottleneck for SMT ...
  • A low cost split-issue technique to improve performance of SMT clustered VLIW processors 

    Gupta, Manoj; Sánchez Carracedo, Fermín; Llosa Espuny, José Francisco (2010)
    Conference report
    Open Access
    Abstract—Very Long Instruction Word (VLIW) processors are a popular choice in embedded domain due to their hardware simplicity, low cost and low power consumption. Simultaneous MultiThreading (SMT) is a popular technique for ...
  • AMA: asynchronous management of accelerators for task-based programming models 

    Planas, Judit; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (Elsevier, 2015)
    Conference report
    Open Access
    Computational science has benefited in the last years from emerging accelerators that increase the performance of scientific simulations, but using these devices hinders the programming task. This paper presents AMA: a set ...
  • A methodology for user-oriented scalability analysis 

    Royo Vallés, María Dolores; Valero García, Miguel; González Colás, Antonio María; Marí, Carme (Institute of Electrical and Electronics Engineers (IEEE), 1997)
    Conference report
    Open Access
    Scalability analysis provides information about the effectiveness of increasing the number of resources of a parallel system. Several methods have been proposed which use different approaches to provide this information. ...
  • An Analysis of Lazy and Eager Limited Preemption Approaches under DAG-Based Global Fixed Priority Scheduling 

    Serrano, Maria A.; Melani, Alessandra; Kehr, Sebastian; Bertogna, Marko; Quiñones, Eduardo (Institute of Electrical and Electronics Engineers (IEEE), 2017-07-03)
    Conference lecture
    Open Access
    DAG-based scheduling models have been shown to effectively express the parallel execution of current many-core heterogeneous architectures. However, their applicability to real-time settings is limited by the difficulties ...