Now showing items 1-20 of 27

  • Accelerating K-mer Frequency Counting with GPU and Non-Volatile Memory 

    Cadenelli, Nicola; Polo, Jordà; Carrera, David (IEEE, 2018-02-15)
    Conference lecture
    Open Access
    The emergence of Next Generation Sequencing (NGS) platforms has increased the throughput of genomic sequencing and in turn the amount of data that needs to be processed, requiring highly efficient computation for its ...
  • Accelerating scientific applications on GPUs 

    Farré Gonzalez, Pau (Universitat Politècnica de Catalunya, 2016-07-04)
    Master thesis
    Open Access
    We have analyzed and accelerated two large scientific applications used at the Barcelona Supercomputer Center (BSC). With this, we want to show how two complex applications can be efficiently ported to GPUs. In addition, ...
  • Advances in GPU architecture for deep learning and scientific computing 

    Parienté, Frédéric (Barcelona Supercomputing Center, 2016-09-10)
    Conference report
    Open Access
    The talk will cover the recent NVIDIA product announcements made at the GTC'16 conference, and how the Pascal GPU and NVLink interconnect technologies greatly improve multi-GPU performance and efficiency in deep learning ...
  • A low-power, high-performance speech recognition accelerator 

    Yazdani, Reza; Arnau Montañés, José María; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2019-12-01)
    Article
    Open Access
    Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at high energy cost, not being affordable for the tiny power-budgeted mobile devices. ...
  • AMA: asynchronous management of accelerators for task-based programming models 

    Planas, Judit; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (Elsevier, 2015)
    Conference report
    Open Access
    Computational science has benefited in the last years from emerging accelerators that increase the performance of scientific simulations, but using these devices hinders the programming task. This paper presents AMA: a set ...
  • An open benchmark implementation for multi-CPU multi-GPU pedestrian detection in automotive systems 

    Trompouki, Matina M.; Kosmidis, Leonidas; Navarro, Nacho (IEEE, 2017-12-14)
    Conference lecture
    Open Access
    Modern and future automotive systems incorporate several Advanced Driving Assistance Systems (ADAS). Those systems require significant performance that cannot be provided with traditional automotive processors and programming ...
  • A unified memory approach to GPU acceleration on task based programming models 

    Rodriguez, Aimar; Beltran Querol, Vicenç (Barcelona Supercomputing Center, 2018-04-24)
    Conference report
    Open Access
  • Benchmarking CPUs and GPUs on embedded platforms for software receiver usage 

    Pany, T.; Dampf, J.; Bär, W.; Winkel, J.; Stöber, C.; Fürlinger, K.; Closas Gómez, Pau; García Molina, J. A. (2015)
    Conference report
    Open Access
    Smartphones containing multi-core central processing units (CPUs) and powerful many-core graphics processing units (GPUs) bring supercomputing technology into your pocket (or into our embedded devices). This can be exploited ...
  • Distributed training strategies for a computer vision deep learning algorithm on a distributed GPU cluster 

    Campos, Victor; Sastre, Francesc; Yagües, Maurici; Bellver, Míriam; Giró Nieto, Xavier; Torres Viñals, Jordi (Elsevier, 2017)
    Article
    Open Access
    Deep learning algorithms base their success on building high learning capacity models with millions of parameters that are tuned in a data-driven fashion. These models are trained by processing millions of examples, so ...
  • Efficient data sharing on heterogeneous systems 

    García-Flores, Víctor; Ayguadé Parra, Eduard; Peña, Antonio J. (Institute of Electrical and Electronics Engineers (IEEE), 2017)
    Conference report
    Restricted access - publisher's policy
    General-purpose computing on GPUs has become more accessible due to features such as shared virtual memory and demand paging. Unfortunately it comes at a price, and that is performance. Automatic memory management is ...
  • Eliminating redundant fragment shader executions on a mobile GPU via hardware memoization 

    Arnau Montañés, José María; Parcerisa Bundó, Joan Manuel; Xekalakis, Polychronis (2014)
    Conference report
    Restricted access - publisher's policy
    Redundancy is at the heart of graphical applications. In fact, generating an animation typically involves the succession of extremely similar images. In terms of rendering these images, this behavior translates into the ...
  • Enabling preemptive multiprogramming on GPUs 

    Tanasic, Ivan; Gelado Fernandez, Isaac; Cabezas, Javier; Ramírez Bellido, Alejandro; Navarro, Nacho; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2014)
    Conference report
    Open Access
    GPUs are being increasingly adopted as compute accelerators in many domains, spanning environments from mobile systems to cloud computing. These systems are usually running multiple applications, from one or several users. ...
  • Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications 

    García Flores, Víctor; Gomez Luna, J.; Grass, Thomas Dieter; Rico, Alejandro; Ayguadé Parra, Eduard; Pena, A. J. (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Conference report
    Restricted access - publisher's policy
    Heterogeneous systems are ubiquitous in the field of High- Performance Computing (HPC). Graphics processing units (GPUs) are widely used as accelerators for their enormous computing potential and energy efficiency; ...
  • GPU-based low-level image processing for object recognition using HDR images 

    Dominguez Tejera, Jonatan (Universitat Politècnica de Catalunya / Karlsruher Institut für Technology, 2012-06-15)
    Master thesis (pre-Bologna period)
    Open Access
    Covenantee:  Karlsruher Institut für Technologie
    [ANGLÈS] The algorithm developed for object recognition using HDR images is divided in 3 modules. The first one obtain images without noise from the scene using a fusion method, which one is based on the pdf of each pixel ...
  • High-Integrity GPU Designs for Critical Real-Time Automotive Systems 

    Alcaide, Sergi; Kosmidis, Leonidas; Hernandez, Carles; Abella, Jaume (IEEE, 2019-04-16)
    Conference lecture
    Open Access
    Autonomous Driving (AD) imposes the use of highperformance hardware, such as GPUs, to perform object recognition and tracking in real-time. However, differently to the consumer electronics market, critical real-time AD ...
  • Implementation of a GPU rasterization stage on a FPGA 

    Navarro Torrentó, Albert (Universitat Politècnica de Catalunya, 2015-10-26)
    Bachelor thesis
    Open Access
    En el següent treball veurem els passos que s'han seguit per la realització de la implementació de l'etapa de rasterització d'una GPU.
  • LTE downlink physical layer processing chain SDR application acceleration with GPUs 

    Arteaga Martínez, Xavier (Universitat Politècnica de Catalunya, 2012-07-13)
    Bachelor thesis
    Open Access
    The technology moves fast and the wireless systems tend to be software defined radio (SDR). The new wireless standards increase the efficiency of communications, also its complexity, which demand more processing. The ...
  • On the fly best view detection using graphics hardware 

    Vázquez Alcocer, Pere Pau; Sbert Cassasayas, Mateu (2004)
    Conference report
    Restricted access - publisher's policy
    Selection of good camera positions has many applications in Computer Graphics. It can be used to compute a walkthrough inside a scene that shows a higher amount of information or to select a minimal set of views for ...
  • Optimisation opportunities and evaluation for GPGPU applications on low-end mobile GPUs 

    Trompouki, Matina M.; Kosmidis, Leonidas (Institute of Electrical and Electronics Engineers (IEEE), 2017-05-15)
    Conference lecture
    Open Access
    Previous works in the literature have shown the feasibility of general purpose computations for non-visual applications on low-end mobile graphics processors using graphics APIs. These works focused only on the functional ...
  • Parallelizing general histogram application for CUDA architectures 

    Milic, Ugljesa; Gelado Fernandez, Isaac; Puzovic, Nikola; Ramírez Bellido, Alejandro; Tomasevic, Milo (IEEE Computational Intelligence Society, 2013)
    Conference report
    Restricted access - publisher's policy
    Histogramming is a tool commonly used in data analysis. Although its serial version is simple to implement, providing an efficient and scalable way to parallelize it can be challenging. This especially holds in case of ...