Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
59.689 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Programes de doctorat
  • Doctorat en Arquitectura de Computadors
  • Ponències/Comunicacions de congressos
  • View Item
  •   DSpace Home
  • E-prints
  • Programes de doctorat
  • Doctorat en Arquitectura de Computadors
  • Ponències/Comunicacions de congressos
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Seamless optimization of the GEMM kernel for task-based programming models

Thumbnail
View/Open
ICS_BLIS_2022 (1).pdf (3,610Mb)
Share:
 
 
10.1145/3524059.3532385
 
  View Usage Statistics
Cita com:
hdl:2117/369338

Show full item record
Lorenzon, Arthur F.
Marques, Sandro M. V. N.
Navarro Muñoz, AntoniMés informació
Beltran Querol, Vicenç
Document typeConference report
Defense date2022
PublisherAssociation for Computing Machinery (ACM)
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
The general matrix-matrix multiplication (GEMM) kernel is a fundamental building block of many scientific applications. Many libraries such as Intel MKL and BLIS provide highly optimized sequential and parallel versions of this kernel. The parallel implementations of the GEMM kernel rely on the well-known fork-join execution model to exploit multi-core systems efficiently. However, these implementations are not well suited for task-based applications as they break the data-flow execution model. In this paper, we present a task-based implementation of the GEMM kernel that can be seamlessly leveraged by task-based applications while providing better performance than the fork-join version. Our implementation leverages several advanced features of the OmpSs-2 programming model and a new heuristic to select the best parallelization strategy and blocking parameters based on the matrix and hardware characteristics. When evaluating the performance and energy consumption on two modern multi-core systems, we show that our implementations provide significant performance improvements over an optimized OpenMP fork-join implementation, and can beat vendor implementations of the GEMM (e.g., Intel MKL and AMD AOCL). We also demonstrate that a real application can leverage our optimized task-based implementation to enhance performance.
CitationLorenzon, A. [et al.]. Seamless optimization of the GEMM kernel for task-based programming models. A: International Conference on Supercomputing. "Proceedings of the 36th ACM International Conference on Supercomputing (ICS-2022): virtual event, June 27–30, 2022". New York: Association for Computing Machinery (ACM), 2022, ISBN 978-1-4503-9281-5. DOI 10.1145/3524059.3532385. 
URIhttp://hdl.handle.net/2117/369338
DOI10.1145/3524059.3532385
ISBN978-1-4503-9281-5
Publisher versionhttps://dl.acm.org/doi/10.1145/3524059.3532385
Collections
  • Doctorat en Arquitectura de Computadors - Ponències/Comunicacions de congressos [232]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
ICS_BLIS_2022 (1).pdf3,610MbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Privacy Settings
  • Inici de la pàgina