Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
59.694 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Centres de recerca
  • BSC - Barcelona Supercomputing Center
  • Computer Sciences
  • Ponències/Comunicacions de congressos
  • View Item
  •   DSpace Home
  • E-prints
  • Centres de recerca
  • BSC - Barcelona Supercomputing Center
  • Computer Sciences
  • Ponències/Comunicacions de congressos
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Cost-aware prediction of uncorrected DRAM errors in the field

Thumbnail
View/Open
UE-Prediction_print.pdf (3,238Mb)
Share:
 
 
10.1109/SC41405.2020.00065
 
  View Usage Statistics
Cita com:
hdl:2117/341921

Show full item record
Boixaderas Coderch, Isaac
Živanovič, Darko
Moré Codina, Sergi
Bartolomé Rodríguez, Javier
Vicente Dorca, David
Casas Guix, Marc
Carpenter, Paul Matthew
Radojković, Petar
Ayguadé Parra, EduardMés informacióMés informacióMés informació
Document typeConference report
Defense date2020
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
ProjectEuroEXA - Co-designed Innovation and System for Resilient Exascale Computing in Europe: From Applications to Silicon (EC-H2020-754337)
Abstract
This paper presents and evaluates a method to predict DRAM uncorrected errors, a leading cause of hardware failures in large-scale HPC clusters. The method uses a random forest classifier, which was trained and evaluated using error logs from two years of production of the MareNostrum 3 supercomputer. By enabling the system to take measures to mitigate node failures, our method reduces lost compute time by up to 57%, a net saving of 21,000 node–hours per year. We release all source code as open source. We also discuss and clarify aspects of methodology that are essential for a DRAM prediction method to be useful in practice. We explain why standard evaluation metrics, such as precision and recall, are insufficient, and base the evaluation on a cost–benefit analysis. This methodology can help ensure that any DRAM error predictor is clear from training bias and has a clear cost–benefit calculation.
CitationBoixaderas, I. [et al.]. Cost-aware prediction of uncorrected DRAM errors in the field. A: International Conference for High Performance Computing, Networking, Storage and Analysis. "Proceedings of SC20: The International Conference for High Performance Computing, Networking, Storage and Analysis: Virtual Event, November 9-19, 2020". Institute of Electrical and Electronics Engineers (IEEE), 2020, p. 1-15. ISBN 978-1-7281-9998-6. DOI 10.1109/SC41405.2020.00065. 
URIhttp://hdl.handle.net/2117/341921
DOI10.1109/SC41405.2020.00065
ISBN978-1-7281-9998-6
Publisher versionhttps://ieeexplore.ieee.org/abstract/document/9355321/
Collections
  • Computer Sciences - Ponències/Comunicacions de congressos [500]
  • CAP - Grup de Computació d'Altes Prestacions - Ponències/Comunicacions de congressos [782]
  • Departament d'Arquitectura de Computadors - Ponències/Comunicacions de congressos [1.847]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
UE-Prediction_print.pdf3,238MbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Privacy Settings
  • Inici de la pàgina