Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
59.781 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Grups de recerca
  • CAP - Grup de Computació d'Altes Prestacions
  • Ponències/Comunicacions de congressos
  • View Item
  •   DSpace Home
  • E-prints
  • Grups de recerca
  • CAP - Grup de Computació d'Altes Prestacions
  • Ponències/Comunicacions de congressos
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

A study on data deduplication in HPC storage systems

Thumbnail
View/Open
sc2012_hpc_dedup.pdf (322,4Kb) (Restricted access)   Request copy 

Què és aquest botó?

Aquest botó permet demanar una còpia d'un document restringit a l'autor. Es mostra quan:

  • Disposem del correu electrònic de l'autor
  • El document té una mida inferior a 20 Mb
  • Es tracta d'un document d'accés restringit per decisió de l'autor o d'un document d'accés restringit per política de l'editorial
Share:
 
 
10.1109/SC.2012.14
 
  View Usage Statistics
Cita com:
hdl:2117/20270

Show full item record
Meister, Dirk
Kaiser, Jürgen
Brinkmann, Andre
Cortés, ToniMés informacióMés informacióMés informació
Kuhn, Michael
Kunkel, Julian
Document typeConference report
Defense date2012
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessRestricted access - publisher's policy
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
Deduplication is a storage saving technique that is highly successful in enterprise backup environments. On a file system, a single data block might be stored multiple times across different files, for example, multiple versions of a file might exist that are mostly identical. With deduplication, this data replication is localized and redundancy is removed – by storing data just once, all files that use identical regions refer to the same unique data. The most common approach splits file data into chunks and calculates a cryptographic fingerprint for each chunk. By checking if the fingerprint has already been stored, a chunk is classified as redundant or unique. Only unique chunks are stored. This paper presents the first study on the potential of data deduplication in HPC centers, which belong to the most demanding storage producers. We have quantitatively assessed this potential for capacity reduction for 4 data centers (BSC, DKRZ, RENCI, RWTH). In contrast to previous deduplication studies focusing mostly on backup data, we have analyzed over one PB (1212 TB) of online file system data. The evaluation shows that typically 20% to 30% of this online data can be removed by applying data deduplication techniques, peaking up to 70% for some data sets. This reduction can only be achieved by a subfile deduplication approach, while approaches based on whole-file comparisons only lead to small capacity savings.
CitationMeister, D. [et al.]. A study on data deduplication in HPC storage systems. A: International Conference for High Performance Computing, Networking, Storage and Analysis. "2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)". Salt Lake City, Utah: Institute of Electrical and Electronics Engineers (IEEE), 2012, p. 1-11. 
URIhttp://hdl.handle.net/2117/20270
DOI10.1109/SC.2012.14
ISBN978-1­4673­0806­9
Collections
  • CAP - Grup de Computació d'Altes Prestacions - Ponències/Comunicacions de congressos [782]
  • Departament d'Arquitectura de Computadors - Ponències/Comunicacions de congressos [1.849]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
sc2012_hpc_dedup.pdfBlocked322,4KbPDFRestricted access

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Privacy Settings
  • Inici de la pàgina