Now showing items 1-20 of 144

    • A cost-based storage format selector for materialized results in big data frameworks 

      Munir, Rana Faisal; Abelló Gamazo, Alberto; Romero Moral, Óscar; Thiele, Maik; Lehner, Wolfgang (2019-05-08)
      Article
      Open Access
      Modern big data frameworks (such as Hadoop and Spark) allow multiple users to do large-scale analysis simultaneously, by deploying data-intensive workflows (DIWs). These DIWs of different users share many common tasks (i.e, ...
    • A general guide to applying machine learning to computer architecture 

      Nemirovsky, Daniel; Arkose, Tugberk; Markovic, Nikola; Nemirovsky, Mario; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Valero Cortés, Mateo (2018)
      Article
      Open Access
      The resurgence of machine learning since the late 1990s has been enabled by significant advances in computing performance and the growth of big data. The ability of these algorithms to detect complex patterns in data which ...
    • A methodology for Spark parameter tuning 

      Gounaris, Anastasios; Torres Viñals, Jordi (2017-05-19)
      Article
      Open Access
      Spark has been established as an attractive platform for big data analysis, since it manages to hide most of the complexities related to parallelism, fault tolerance and cluster setting from developers. However, this comes ...
    • A new reliability-based data-driven approach to simulation-based models 

      Ayensa Jiménez, Jacobo; Doweidar, Mohamed Hamdy; Doblaré Castellano, Manuel (Barcelona Supercomputing Center, 2017-05-04)
      Conference report
      Open Access
      Data Science has burst into simulation-based en-gineering sciences with an impressive impulse. However, data are never uncertainty-free and a suitable approach is needed to face data measurement errors and their intrinsic ...
    • A programming model for hybrid workflows: combining task-based workflows and dataflows all-in-one 

      Ramón Cortés, Cristian; Lordan Gomis, Francesc; Ejarque Artigas, Jorge; Badia Sala, Rosa Maria (Elsevier, 2020-12)
      Article
      Restricted access - publisher's policy
      In the past years, e-Science applications have evolved from large-scale simulations executed in a single cluster to more complex workflows where these simulations are combined with High-Performance Data Analytics (HPDA). ...
    • A Quick View on Current Techniques and Machine Learning Algorithms for Big Data Analytics 

      Berral García, Josep Lluís (Institute of Electrical and Electronics Engineers (IEEE), 2016)
      Conference report
      Open Access
      Big-data is an excellent source of knowledge and information from our systems and clients, but dealing with such amount of data requires automation, and this brings us to data mining and machine leaming techniques. In ...
    • A scalable synthetic traffic model of Graph500 for computer networks analysis 

      Fuentes Sáez, Pablo; Benito, Mariano; Vallejo, Enrique; Bosque Orero, José Luis; Beivide Palacio, Ramon; Anghel, Andreea; Rodríguez Herrera, Germán; Gusat, Mitch; Minkenberg, Cyriel; Valero Cortés, Mateo (2017-12-25)
      Article
      Open Access
      The Graph500 benchmark attempts to steer the design of High-Performance Computing systems to maximize the performance under memory-constricted application workloads. A realistic simulation of such benchmarks for architectural ...
    • A software reference architecture for semantic-aware big data systems 

      Nadal Francesch, Sergi; Herrero Otal, Víctor; Romero Moral, Óscar; Abelló Gamazo, Alberto; Franch Gutiérrez, Javier; Vansummeren, Stijn; Valerio, Danilo (2016-06-13)
      Article
      Open Access
      Context: Big Data systems are a class of software systems that ingest, store, process and serve massive amounts of heterogeneous data, from multiple sources. Despite their undisputed impact in current society, their ...
    • Accelerating Hash-Based Query Processing Operations on FPGAs by a Hash Table Caching Technique 

      Salami, Behzad; Arcas-Abella, Oriol; Sonmez, Nehir; Unsal, Osman; Cristal Kestelman, Adrián (Springer International Publishing, 2017-04-29)
      Conference lecture
      Open Access
      Extracting valuable information from the rapidly growing field of Big Data faces serious performance constraints, especially in the software-based database management systems (DBMS). In a query processing system, hash-based ...
    • ALOJA: A benchmarking and predictive platform for big data performance analysis 

      Poggi, Nicolas; Berral García, Josep Lluís; Carrera Pérez, David (Springer, 2016)
      Conference report
      Open Access
      The main goals of the ALOJA research project from BSC-MSR, are to explore and automate the characterization of cost-effectivenessof Big Data deployments. The development of the project over its first year, has resulted in ...
    • ALOJA: A framework for benchmarking and predictive analytics in Hadoop deployments 

      Berral García, Josep Lluís; Poggi Mastrokalo, Nicolas; Carrera Pérez, David; Call, Aaron; Reinauer, Rob; Green, Daron (Institute of Electrical and Electronics Engineers (IEEE), 2015-10)
      Article
      Open Access
      This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and ...
    • ALOJA: a systematic study of Hadoop deployment variables to enable automated characterization of cost-effectiveness 

      Poggi Mastrokalo, Nicolas; Carrera Pérez, David; Call, Aaron; Mendoza, Sergio; Becerra Fontal, Yolanda; Torres Viñals, Jordi; Ayguadé Parra, Eduard; Gagliardi, Fabrizio; Labarta Mancho, Jesús José; Reinauer, Rob; Vujic, Nikola; Green, Daron; Blakeley, Jose (Institute of Electrical and Electronics Engineers (IEEE), 2014)
      Conference report
      Open Access
      This article presents the ALOJA project, an initiative to produce mechanisms for an automated characterization of cost-effectiveness of Hadoop deployments and reports its initial results. ALOJA is the latest phase of a ...
    • An analysis of Bicing mobility patterns using big data 

      Manchón Contreras, Oriol (Universitat Politècnica de Catalunya, 2016-06-23)
      Master thesis
      Open Access
      Nowadays, technology advances really fast and so does the generation of data. Almost all electronic devices are constantly generating and sharing a huge amount of data through the World Wide Web. Moreover, recent policies ...
    • An integration-oriented ontology to govern evolution in big data ecosystems 

      Nadal Francesch, Sergi; Romero Moral, Óscar; Abelló Gamazo, Alberto; Vassiliadis, Panos; Vansummeren, Stijn (CEUR-WS.org, 2017)
      Conference lecture
      Open Access
      Big Data architectures allow to flexibly store and process heterogeneous data, from multiple sources, in its original format. The structure of those data, commonly supplied by means of REST APIs, is continuously evolving, ...
    • An integration-oriented ontology to govern evolution in big data ecosystems 

      Nadal Francesch, Sergi; Romero Moral, Óscar; Abelló Gamazo, Alberto; Vassiliadis, Panos; Vansummeren, Stijn (2019-01)
      Article
      Open Access
      Big Data architectures allow to flexibly store and process heterogeneous data, from multiple sources, in their original format. The structure of those data, commonly supplied by means of REST APIs, is continuously evolving. ...
    • Anàlisi d'arxius Big Data aplicant mostreig 

      Moragues Cabanes, Marta (Universitat Politècnica de Catalunya, 2015-01-20)
      Master thesis (pre-Bologna period)
      Open Access
      El projecte és un estudi sobre l'optimitzacó dels recursos, cost i temps d'execució d'un job de Hadoop mitjançant la implementació de tècniques de sampling.
    • Anàlisi del comerç en línia a partir de l'estudi del client i les seves valoracions 

      Villalobos Guiral, Albert (Universitat Politècnica de Catalunya, 2017-06)
      Bachelor thesis
      Restricted access - confidentiality agreement
    • Análisis de ventajas en el uso de una BDD noSQL para un sistema de Monitorización y Trazabilidad de Operaciones de Negocio 

      García Calatrava, Carlos (Universitat Politècnica de Catalunya, 2015-06)
      Bachelor thesis
      Restricted access - confidentiality agreement
    • Apache Mahout’s k-Means vs. fuzzy k-Means performance evaluation 

      Xhafa Xhafa, Fatos; Bogza, Adriana; Caballé Llobet, Santiago; Barolli, Leonard (Institute of Electrical and Electronics Engineers (IEEE), 2016)
      Conference report
      Open Access
      The emergence of the Big Data as a disruptive technology for next generation of intelligent systems, has brought many issues of how to extract and make use of the knowledge obtained from the data within short times, limited ...
    • Aplicación web para la gestión de información en materia de seguridad en una corporación 

      Valentí Mañé, Marcos (Universitat Politècnica de Catalunya, 2016-10-26)
      Bachelor thesis
      Restricted access - confidentiality agreement