Computing methods for parallel processing and analysis on complex networks
Document typeMaster thesis
Rights accessOpen Access
Nowadays to solve some problems is required to model complex systems to simulate and understand its behavior. A good example of one of those complex systems is the Facebook Social Network, this system represents people and their relationships, Other example, the Internet composed by a vast number of servers, computers, modems and routers, All Science field (physics, economics political, and so on) have complex systems which are complex because of the big volume of data required to represent them and their fast change on their structure Analyze the behavior of these complex systems is important to create simulations or discover dynamics over it with main goal of understand how it works. Some complex systems cannot be easily modeled; We can begin by analyzing their structure, this is possible creating a network model, Mapping the problem´s entities and the relations between them. Some popular analysis over the structure of a network are: • The Community Detection – discover how their entities are grouped • Identify the most important entities – measure the node´s influence over the network • Features over whole network like – the diameter, number of triangles, clustering coefficient, and the shortest path between two entities. Multiple algorithms have been created to give a result to these analyses over the network model although if they are executed by one machine take a lot of time to complete the task or may not be executed due to machine limitation resources. As more demanding applications have been appearing to process the algorithms of these type of analysis, several parallel programming models and different kind of hardware architecture have been created to deal with the big input of data, reduce the time execution, save power consumption and enhance the efficiency in the computation in each machine also taking in mine the application requirements. Parallelize these algorithms are a challenge due to: • We need to analyze data dependence to implement a parallel version of the algorithm always taking in mine the scalability and the performance of the code. • Create a implementation of the algorithm for one parallel programming model like MapReduce (Apache Hadoop), RDD (Apache Spark), Pregel(Apache Giraph) these oriented to bigdata or HPC models how MPI + OpenMP , OmpSS or CUDA. • Distribute the data input over the processing platform for each node or offload it into accelerators such as GPU or FPGA and so on. • Store the data input and store the result of the processing requires techniques of Distribute file systems(HDFS), distribute NoSQL Data Bases (Object Data Bases, Graph Data Bases, Document Data Bases) or traditional relational Data Bases(oracle, SQL server). In this Master Thesis, we decided create Graph processing using Apache bigdata Tools mainly creating testing over MareNostrum III and the Amazon cloud for some Community Detection Algorithms using SNAP Graphs with ground-truth communities. Creating a comparative between their parallel computational time execution and scalability.