Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
59.781 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Centres de recerca
  • BSC - Barcelona Supercomputing Center
  • Computer Sciences
  • Ponències/Comunicacions de congressos
  • View Item
  •   DSpace Home
  • E-prints
  • Centres de recerca
  • BSC - Barcelona Supercomputing Center
  • Computer Sciences
  • Ponències/Comunicacions de congressos
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Spatial support vector regression to detect silent errors in the exascale era

Thumbnail
View/Open
Spatial Support Vector Regression to Detect Silent Errors in the Exascale Era.pdf (863,0Kb)
Share:
 
 
10.1109/CCGrid.2016.33
 
  View Usage Statistics
Cita com:
hdl:2117/97167

Show full item record
Subasi, Omer
Di, Sheng
Bautista Gomez, LeonardoMés informació
Balaprakash, Prasanna
Unsal, Osman Sabri
Labarta Mancho, Jesús JoséMés informacióMés informacióMés informació
Cristal Kestelman, AdriánMés informacióMés informació
Cappello, Franck
Document typeConference report
Defense date2016
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
ProjectCOMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
Abstract
As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems with targeted power and energy budget goals introduces significant challenges in reliability. Silent data corruptions (SDCs) or silent errors are one of the major sources that corrupt the executionresults of HPC applications without being detected. In this work, we explore a low-memory-overhead SDC detector, by leveraging epsilon-insensitive support vector machine regression, to detect SDCs that occur in HPC applications that can be characterized by an impact error bound. The key contributions are three fold. (1) Our design takes spatialfeatures (i.e., neighbouring data values for each data point in a snapshot) into training data, such that little memory overhead (less than 1%) is introduced. (2) We provide an in-depth study on the detection ability and performance with different parameters, and we optimize the detection range carefully. (3) Experiments with eight real-world HPC applications show thatour detector can achieve the detection sensitivity (i.e., recall) up to 99% yet suffer a less than 1% of false positive rate for most cases. Our detector incurs low performance overhead, 5% on average, for all benchmarks studied in the paper. Compared with other state-of-the-art techniques, our detector exhibits the best tradeoff considering the detection ability and overheads.
CitationSubasi, O., Di, S., Bautista, L., Balaprakash, P., Unsal, O., Labarta, J., Cristal, A., Cappello, F. Spatial support vector regression to detect silent errors in the exascale era. A: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. "2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2016: 16-19 May 2016, Cartagena, Colombia: proceedings". Cartagena: Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 413-424. 
URIhttp://hdl.handle.net/2117/97167
DOI10.1109/CCGrid.2016.33
ISBN978-1-5090-2452-0
Publisher versionhttp://ieeexplore.ieee.org/document/7515717/
Collections
  • Computer Sciences - Ponències/Comunicacions de congressos [501]
  • CAP - Grup de Computació d'Altes Prestacions - Ponències/Comunicacions de congressos [782]
  • Departament d'Arquitectura de Computadors - Ponències/Comunicacions de congressos [1.849]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
Spatial Support ... rs in the Exascale Era.pdf863,0KbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Privacy Settings
  • Inici de la pàgina