• Graph partitioning applied to DAG scheduling to reduce NUMA effects 

    Sánchez Barrera, Isaac; Casas, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2018)
    Comunicación de congreso
    Acceso abierto
    The complexity of shared memory systems is becoming more relevant as the number of memory domains increases, with different access latencies and bandwidth rates depending on the proximity between the cores and the devices ...
  • How can we improve energy efficiency through user-directed vectorization and task-based parallelization? 

    Caminal, Helena; Caballero, Diego; Cebrián, Juan M.; Ferrer, Roger; Casas, Marc; Moreto Planas, Miquel; Martorell Bofill, Xavier; Valero Cortés, Mateo (Barcelona Supercomputing Center, 2015-05-05)
    Acceso abierto
    Heterogeneity, parallelization and vectorization are key techniques to improve the performance and energy efficiency of modern computing systems. However, programming and maintaining code for these architectures poses a ...
  • Improving cache Behavior in CMP architectures throug cache partitioning techniques 

    Moreto Planas, Miquel (Universitat Politècnica de Catalunya, 2010-03-19)
    Tesis
    Acceso abierto
    The evolution of microprocessor design in the last few decades has changed significantly, moving from simple inorder single core architectures to superscalar and vector architectures in order to extract the maximum available ...
  • Improving scalability of task-based programs 

    Brumar, Iulian; Casas, Marc; Moreto Planas, Miquel (Barcelona Supercomputing Center, 2015-05-05)
    Texto en actas de congreso
    Acceso abierto
    In a multi-core era, parallel programming allows further performance improvements, but with an important programmability cost. We envision that the best approach to parallel programming that can exceed the programability, ...
  • iQ: an efficient and flexible queue-based simulation framework 

    Roca, Damian; Nemirovsky, Daniel; Casas, Marc; Moreto Planas, Miquel; Valero Cortés, Mateo; Nemirovsky, Mario (Institute of Electrical and Electronics Engineers (IEEE), 2017)
    Texto en actas de congreso
    Acceso abierto
    Conventional system simulators are readily used by computer architects to design and evaluate their processor designs. These simulators provide reasonable levels of accuracy and execution detail but suffer from long ...
  • ITCA: Inter-Task Conflict-Aware CPU accounting for CMP 

    Luque, Carlos; Moreto Planas, Miquel; Cazorla Almeida, Francisco Javier; Gioiosa, Roberto; Valero Cortés, Mateo (2010)
    Texto en actas de congreso
    Acceso abierto
    Chip-MultiProcessors (CMP) introduce complexities when accounting CPU utilization to processes because the progress done by a process during an interval of time highly depends on the activity of the other processes it is ...
  • ITCA: Inter-Task Conflict-Aware CPU accounting for CMPs 

    Luque, Carlos; Moreto Planas, Miquel; Cazorla, Francisco; Gioiosa, Roberto; Buyuktosunoglu, Alper; Valero Cortés, Mateo (IEEE Computer Society, 2009)
    Texto en actas de congreso
    Acceso abierto
    Chip-MultiProcessor (CMP) architectures are becoming more and more popular as an alternative to the traditional processors that only extract instruction-level parallelism from an application. CMPs introduce complexities ...
  • ITCA: inter-task conflict-aware CPU accounting for CMPs 

    Luque, Carlos; Moreto Planas, Miquel; Cazorla Almeida, Francisco Javier; Gioiosa, Roberto; Buyuktosunoglu, Alper; Valero Cortés, Mateo (IEEE Computer Society Publications, 2009)
    Texto en actas de congreso
    Acceso restringido por política de la editorial
  • libPRISM: an intelligent adaptation of prefetch and SMT levels 

    Ortega, Cristobal; Moreto Planas, Miquel; Casas Guix, Marc; Bertran, Ramon; Buyuktosunoglu, Alper; Eichenberger, Alexandre; Bose, Pradip (Association for Computing Machinery (ACM), 2017)
    Texto en actas de congreso
    Acceso abierto
    Current microprocessors include several knobs to modify the hardware behavior in order to improve performance under different workload demands. An impractical and time consuming offline profiling is needed to evaluate the ...
  • MLP-aware dynamic cache partitioning 

    Moreto Planas, Miquel; Cazorla Almeida, Francisco Javier; Ramírez Bellido, Alejandro; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2007)
    Comunicación de congreso
    Acceso abierto
    The limitation imposed by instruction-level parallelism (ILP) has motivated the use of thread-level parallelism (TLP) as a common strategy for improving processor performance. TLP paradigms such as simultaneous multithreading ...
  • Multicore resource management 

    Nesbit, Kyle J.; Smith, James E.; Moreto Planas, Miquel; Cazorla, Francisco; Ramírez Bellido, Alejandro; Valero Cortés, Mateo (2008-06)
    Artículo
    Acceso abierto
    Current resource management mechanisms and policies are inadequate for future multicore systems. Instead, a hardware/software interface based on the virtual private machine abstraction would allow software policies to ...
  • MUSA: a multi-level simulation approach for next-generation HPC machines 

    Grass, Thomas; Allande, César; Armejach, Adrià; Rico, Alejandro; Ayguadé Parra, Eduard; Labarta, Jesús; Valero Cortés, Mateo; Casas, Marc; Moreto Planas, Miquel (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Texto en actas de congreso
    Acceso restringido por política de la editorial
    The complexity of High Performance Computing (HPC) systems is increasing in the number of components and their heterogeneity. Interactions between software and hardware involve many different aspects which are typically ...
  • Online prediction of applications cache utility 

    Moreto Planas, Miquel; Cazorla, Francisco; Ramírez Bellido, Alejandro; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2007)
    Texto en actas de congreso
    Acceso abierto
    General purpose architectures are designed to offer average high performance regardless of the particular application that is being run. Performance and power inefficiencies appear as a consequence for some programs. ...
  • On the benefits of tasking with OpenMP 

    Rico, Alejandro; Sánchez Barrera, Isaac; Joao, Jose A.; Randall, Joshua; Casas, Marc; Moreto Planas, Miquel (Springer, 2019)
    Texto en actas de congreso
    Acceso restringido por política de la editorial
    Tasking promises a model to program parallel applications that provides intuitive semantics. In the case of tasks with dependences, it also promises better load balancing by removing global synchronizations (barriers), and ...
  • On the convergence of mainstream and mission-critical markets 

    Girbal, Sylvain; Moreto Planas, Miquel; Grasset, Arnaud; Abella Ferrer, Jaume; Quiñones, Eduardo; Cazorla Almeida, Francisco Javier; Yehia, Sami (Institute of Electrical and Electronics Engineers (IEEE), 2013)
    Texto en actas de congreso
    Acceso restringido por política de la editorial
    The computing market has been dominated during the last two decades by the well-known convergence of the highperformance computing market and the mobile market. In this paper we witness a new type of convergence between ...
  • On the maturity of parallel applications for asymmetric multi-core processors 

    Chronaki, Kallia; Moreto Planas, Miquel; Casas, Marc; Rico, Alejandro; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Valero Cortés, Mateo (Elsevier, 2019-05-01)
    Artículo
    Acceso restringido por política de la editorial
    Asymmetric multi-cores (AMCs) are a successful architectural solution for both mobile devices and supercomputers. By maintaining two types of cores (fast and slow) AMCs are able to provide high performance under the facility ...
  • PARSECSs: Evaluating the impact of task parallelism in the PARSEC benchmark suite 

    Chasapis, Dimitrios; Casas Guix, Marc; Moreto Planas, Miquel; Vidal Ortiz, Raul; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (2015-12-01)
    Artículo
    Acceso abierto
    In this work, we show how parallel applications can be implemented efficiently using task parallelism. We also evaluate the benefits of such parallel paradigm with respect to other approaches. We use the PARSEC benchmark ...
  • Performance and energy effects on task-based parallelized applications: User-directed versus manual vectorization 

    Caminal Pallarés, Helena; Caballero de Gea, Diego; Cebrián González, Juan Manuel; Ferrer, Roger; Casas, Marc; Moreto Planas, Miquel; Martorell Bofill, Xavier; Valero Cortés, Mateo (2018-06)
    Artículo
    Acceso abierto
    Heterogeneity, parallelization and vectorization are key techniques to improve the performance and energy efficiency of modern computing systems. However, programming and maintaining code for these architectures poses a ...
  • Peripheral twists for torus topologies with arbitrary aspect ratio 

    Vallejo Gutiérrez, Enrique; Moreto Planas, Miquel; Martínez, Carmen; Beivide Palacio, Julio Ramón (2011)
    Texto en actas de congreso
    Acceso abierto
    A torus is a common topology used in supercomputer networks. Asymmetric Tori suffer from resource usage imbalance, which translates to reduced performance. Twisted Tori employ a twist in the peripheral links of one or more ...
  • Per-task energy accounting in computing systems 

    Liu, Qixiao; Jiménez, Víctor; Moreto Planas, Miquel; Abella, Jaume; Cazorla, Francisco; Valero Cortés, Mateo (2013)
    Report de recerca
    Acceso abierto
    We present for the first time the concept of per-task energy accounting (PTEA) and relate it to per-task energy metering (PTEM). We show the benefits of supporting both in future computing systems. Using the shared last-level ...