Now showing items 1-20 of 29

    • A hardware/software co-design of K-mer counting using a CAPI-enabled FPGA 

      Haghi, Abbas; Álvarez Martí, Lluc; Polo Bardés, Jorda; Diamantopoulos, Dionysios; Hagleitner, Christoph; Moretó Planas, Miquel (Institute of Electrical and Electronics Engineers (IEEE), 2020)
      Conference report
      Open Access
      Advances in Next Generation Sequencing (NGS) technologies have caused the proliferation of genomic applications to detect DNA mutations and guide personalized medicine. These applications have an enormous computational ...
    • A review of CNN accelerators for embedded systems based on RISC-V 

      Sanchez Flores, Alejandra; Álvarez Martí, Lluc; Alorda Ladaria, Bartomeu (Institute of Electrical and Electronics Engineers (IEEE), 2022)
      Conference report
      Restricted access - publisher's policy
      One of the great challenges of computing today is sustainable energy consumption. In the deployment of edge computing this challenge is particularly important considering the use of embedded equipment with limited energy ...
    • A review of CNN accelerators for embedded systems based on RISC-V 

      Sanchez Flores, Alejandra; Álvarez Martí, Lluc; Alorda Ladaria, Bartomeu (Institute of Electrical and Electronics Engineers (IEEE), 2022)
      Conference lecture
      Open Access
      One of the great challenges of computing today is sustainable energy consumption. In the deployment of edge computing this challenge is particularly important considering the use of embedded equipment with limited energy ...
    • A two level neural approach combining off-chip prediction with adaptive prefetch filtering 

      Jamet, Alexandre Valentin; Vavouliotis, Georgios; Jiménez, Daniel A.; Álvarez Martí, Lluc; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2024)
      Conference report
      Open Access
      To alleviate the performance and energy overheads of contemporary applications with large data footprints, we propose the Two Level Perceptron (TLP) predictor, a neural mechanism that effectively combines predicting whether ...
    • Accelerating edit-distance sequence alignment on GPU using the wavefront algorithm 

      Aguado Puig, Quim; Marco Sola, Santiago; Moure López, Juan Carlos; Castells Rufas, David; Álvarez Martí, Lluc; Espinosa Morales, Antonio; Moretó Planas, Miquel (Institute of Electrical and Electronics Engineers (IEEE), 2022-06-10)
      Article
      Open Access
      Sequence alignment remains a fundamental problem with practical applications ranging from pattern recognition to computational biology. Traditional algorithms based on dynamic programming are hard to parallelize, require ...
    • An FPGA accelerator of the wavefront algorithm for genomics pairwise alignment 

      Haghi, Abbas; Marco Sola, Santiago; Álvarez Martí, Lluc; Diamantopoulos, Dionysios; Hagleitner, Christoph; Moretó Planas, Miquel (Institute of Electrical and Electronics Engineers (IEEE), 2021)
      Conference report
      Open Access
      In the last years, advances in next-generation sequencing technologies have enabled the proliferation of genomic applications that guide personalized medicine. These applications have an enormous computational cost due to ...
    • Architectural support for task dependence management with flexible software scheduling 

      Castillo, Emilio; Álvarez Martí, Lluc; Moretó Planas, Miquel; Casas, Marc; Vallejo, Enrique; Bosque, Jose L.; Beivide Palacio, Ramon; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2018)
      Conference report
      Open Access
      The growing complexity of multi-core architectures has motivated a wide range of software mechanisms to improve the orchestration of parallel executions. Task parallelism has become a very attractive approach thanks to its ...
    • CATA: Criticality aware task acceleration for multicore processors 

      Castillo, Emilio; Moretó Planas, Miquel; Casas, Marc; Álvarez Martí, Lluc; Vallejo, Enrique; Chronaki, Kallia; Badia Sala, Rosa Maria; Bosque Orero, José Luis; Beivide Palacio, Julio Ramón; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2016)
      Conference report
      Open Access
      Managing criticality in task-based programming models opens a wide range of performance and power optimization opportunities in future manycore systems. Criticality aware task schedulers can benefit from these opportunities ...
    • Characterizing the impact of last-level cache replacement policies on big-data workloads 

      Jamet, Alexandre Valentin; Álvarez Martí, Lluc; Jiménez, Daniel A.; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2020)
      Conference report
      Open Access
      The vast disparity between Last Level Cache (LLC) and memory latencies has motivated the need for efficient cache management policies. The computer architecture literature abounds with work on LLC replacement policy. ...
    • Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures 

      Álvarez Martí, Lluc; Vilanova, Lluís; Moretó Planas, Miquel; Casas, Marc; González Tallada, Marc; Martorell Bofill, Xavier; Navarro, Nacho; Ayguadé Parra, Eduard; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2015)
      Conference report
      Open Access
      The increasing number of cores in manycore architectures causes important power and scalability problems in the memory subsystem. One solution is to introduce scratchpad memories alongside the cache hierarchy, forming a ...
    • Energy and precision evaluation of a systolic array accelerator using a quantization approach for edge computing 

      Sanchez Flores, Alejandra; Fornt Mas, Jordi; Álvarez Martí, Lluc; Alorda Ladaria, Bartomeu (2024-07-18)
      Article
      Open Access
      This paper focuses on the implementation of a neural network accelerator optimized for speed and energy efficiency, for use in embedded machine learning. Specifically, we explore power reduction at the hardware level through ...
    • Hardware-software coherence protocol for the coexistence of caches and local memories 

      Álvarez Martí, Lluc; Vilanova, Lluís; González Tallada, Marc; Martorell Bofill, Xavier; Navarro, Nacho; Ayguadé Parra, Eduard (2015-01-01)
      Article
      Open Access
      Cache coherence protocols limit the scalability of multicore and manycore architectures and are responsible for an important amount of the power consumed in the chip. A good way to alleviate these problems is to introduce ...
    • Intelligent adaptation of hardware knobs for improving performance and power consumption 

      Ortega Carrasco, Cristobal; Álvarez Martí, Lluc; Casas, Marc; Bertran, Ramon; Buyuktosunoglu, Alper; Eichenberger, Alexandre; Bose, Pradip; Moretó Planas, Miquel (Institute of Electrical and Electronics Engineers (IEEE), 2021-01-01)
      Article
      Open Access
      Current microprocessors include several knobs to modify the hardware behavior in order to improve performance, power, and energy under different workload demands. An impractical and time consuming offline profiling is ...
    • OpenPiton optimizations towards high performance manycores 

      Leyva Santes, Neiel Israel; Monemi, Alireza; Oliete Escuín, Noelia; López Paradís, Guillem; Abancens Calvo, Xabier; Balkind, Jonathan; Vallejo Gutiérrez, Enrique; Moretó Planas, Miquel; Álvarez Martí, Lluc (Association for Computing Machinery (ACM), 2023)
      Conference report
      Open Access
      In recent years, numerous multicore RISC-V platforms have emerged. Within the RISC-V ecosystem, Networks-on-Chip (NoCs) such as OpenPiton are employed in designs that aim to scale to a large number of cores. This paper ...
    • Page size aware cache prefetching 

      Vavouliotis, Georgios; Chacon, Gino; Álvarez Martí, Lluc; Gratz, Paul V.; Jiménez, Daniel A.; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2022)
      Conference report
      Open Access
      The increase in working set sizes of contemporary applications outpaces the growth in cache sizes, resulting in frequent main memory accesses that deteriorate system per- formance due to the disparity between processor and ...
    • Peachy Parallel Assignments (EduHPC 2018) 

      Ayguadé Parra, Eduard; Álvarez Martí, Lluc; Banchelli Gracia, Fabio; Burtscher, Martin; González Escribano, Arturo; Gutiérrez Monge, Julián; Joiner, David A.; Kaeli, David; Previlon, Fritz; Rodríguez Gutiez, Eduardo; Bunde, David P. (Institute of Electrical and Electronics Engineers (IEEE), 2018)
      Conference report
      Open Access
      Peachy Parallel Assignments are a resource for instructors teaching parallel and distributed programming. These are high-quality assignments, previously tested in class, that are readily adoptable. This collection of ...
    • Practically tackling memory bottlenecks of graph-processing workloads 

      Jamet, Alexandre Valentin; Vavouliotis, Georgios; Jiménez, Daniel A.; Álvarez Martí, Lluc; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2024)
      Conference report
      Open Access
      Graph-processing workloads have become widespread due to their relevance on a wide range of application domains such as network analysis, path- planning, bioinformatics, and machine learning. Graph-processing workloads ...
    • Pushing the envelope on free TLB prefetching 

      Vavouliotis, Georgios; Álvarez Martí, Lluc; Casas, Marc (Barcelona Supercomputing Center, 2021-05)
      Conference report
      Open Access
      Frequent Translation Lookaside Buffer (TLB) misses pose significant performance and energy overheads due to page walks required for fetching the translations. The address translation performance bottleneck is further ...
    • Reducing cache coherence traffic with a NUMA-aware runtime approach 

      Caheny, Paul; Álvarez Martí, Lluc; Derradji, Said; Valero Cortés, Mateo; Moretó Planas, Miquel; Casas, Marc (2018-05)
      Article
      Open Access
      Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves ...
    • Runtime-assisted cache coherence deactivation in task parallel programs 

      Caheny, Paul; Álvarez Martí, Lluc; Valero Cortés, Mateo; Moretó Planas, Miquel; Casas, Marc (Association for Computing Machinery (ACM), 2018)
      Conference report
      Open Access
      With increasing core counts, the scalability of directory-based cache coherence has become a challenging problem. To reduce the area and power needs of the directory, recent proposals reduce its size by classifying data ...