Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
76.494 UPC academic works
You are here:
View Item 
  •   DSpace Home
  • Treballs acadèmics
  • Màsters oficials
  • Màster universitari Erasmus Mundus en Tecnologies de la Informació per a la Intel·ligència Empresarial (IT4BI)
  • View Item
  •   DSpace Home
  • Treballs acadèmics
  • Màsters oficials
  • Màster universitari Erasmus Mundus en Tecnologies de la Informació per a la Intel·ligència Empresarial (IT4BI)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Cardinality Estimation in Shared-Nothing Parallel Dataflow Engines

Thumbnail
View/Open
108938.pdf (709,0Kb)
  View UPCommons Usage Statistics
  LA Referencia / Recolecta stats
Includes usage data since 2022
Cita com:
hdl:2117/77883

Show full item record
Mendt Peters, Tamara Desiree
Tutor / directorAbelló Gamazo, AlbertoMés informacióMés informacióMés informació
Document typeMaster thesis
Date2015-07-31
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
Shared nothing parallel data ow systems aim to bridge the gap between MapReduce and RDBMSs by combining parallel execution of second order functions with operator based optimizations. In parallel systems, job latency is strongly affected by data shuffling and unbalanced data across nodes, thus the degree of parallelism and the data partition- ing functions must be carefully considered when choosing optimization strategies. However, it is hard to make good optimization choices with- out any information about the distribution of the data. We attempt to overcome this challenge in shared nothing parallel data ows by tracking statistics of data sets during query runtime. We use data streaming algo- rithms to track statistics so as to affect job latency as little as possible. We discuss how collected statistics can potentially be used to improve execution plans during runtime.
SubjectsParallel computers, Ordinadors paral·lels
DegreeMÀSTER UNIVERSITARI ERASMUS MUNDUS EN TECNOLOGIES DE LA INFORMACIÓ PER A LA INTEL·LIGÈNCIA EMPRESARIAL (Pla 2012)
URIhttp://hdl.handle.net/2117/77883
Collections
  • Màsters oficials - Màster universitari Erasmus Mundus en Tecnologies de la Informació per a la Intel·ligència Empresarial (IT4BI) [18]
  View UPCommons Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
108938.pdf709,0KbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Metadata under:Metadata under CC0
  • Contact Us
  • Send Feedback
  • Privacy Settings
  • Inici de la pàgina