1st High Performance Graph Mining workshop, Sydney, 10 August 2015

1st High Performance Graph Mining workshop, Sydney, 10 August 2015 http://hdl.handle.net/2117/76366 2024-04-25T09:21:04Z GraSP: distributed streaming graph partitioning http://hdl.handle.net/2117/76383 GraSP: distributed streaming graph partitioning Battaglino, Casey; Pienta, Pienta; Vuduc, Richard This paper presents a distributed, streaming graph parti- tioner, Graph Streaming Partitioner (GraSP), which makes partition decisions as each vertex is read from memory, sim- ulating an online algorithm that must process nodes as they arrive. GraSP is a lightweight high-performance comput- ing (HPC) library implemented in MPI, designed to be easily substituted for existing HPC partitioners such as ParMETIS. It is the rst MPI implementation for streaming partition- ing of which we are aware, and is empirically orders-of- magnitude faster than existing partitioners while providing comparable partitioning quality. We demonstrate the scala- bility of GraSP on up to 1024 compute nodes of NERSC's Edison supercomputer. Given a minute of run-time, GraSP can partition a graph three orders of magnitude larger than ParMETIS can. 2015-07-29T11:40:29Z Battaglino, Casey Pienta, Pienta Vuduc, Richard This paper presents a distributed, streaming graph parti- tioner, Graph Streaming Partitioner (GraSP), which makes partition decisions as each vertex is read from memory, sim- ulating an online algorithm that must process nodes as they arrive. GraSP is a lightweight high-performance comput- ing (HPC) library implemented in MPI, designed to be easily substituted for existing HPC partitioners such as ParMETIS. It is the rst MPI implementation for streaming partition- ing of which we are aware, and is empirically orders-of- magnitude faster than existing partitioners while providing comparable partitioning quality. We demonstrate the scala- bility of GraSP on up to 1024 compute nodes of NERSC's Edison supercomputer. Given a minute of run-time, GraSP can partition a graph three orders of magnitude larger than ParMETIS can. Parallel k nearest neighbor graph construction using tree-based data structures http://hdl.handle.net/2117/76382 Parallel k nearest neighbor graph construction using tree-based data structures Rajani, Nazneen; McArdle, Kate; Dhillon, Inderjit S. Construction of a nearest neighbor graph is often a neces- sary step in many machine learning applications. However, constructing such a graph is computationally expensive, es- pecially when the data is high dimensional. Python's open source machine learning library Scikit-learn uses k-d trees and ball trees to implement nearest neighbor graph construc- tion. However, this implementation is ine cient for large datasets. In this work, we focus on exploiting these under- lying tree-based data structures to optimize parallel execu- tion of the nearest neighbor algorithm. We present parallel implementations of nearest neighbor graph construction us- ing such tree structures, with parallelism provided by the OpenMP and the Galois framework. We empirically show that our parallel and exact approach is e cient as well as scalable, compared to the Scikit-learn implementation. We present the rst implementation of k-d trees and ball trees using Galois. Our results show that k-d trees are faster when the number of dimensions is small (2d N); ball trees on the other hand scale well with the number of dimensions. Our implementation of ball trees in Galois has almost linear speedup on a number of datasets irrespective of the size and dimensionality of the data. 2015-07-29T11:32:22Z Rajani, Nazneen McArdle, Kate Dhillon, Inderjit S. Construction of a nearest neighbor graph is often a neces- sary step in many machine learning applications. However, constructing such a graph is computationally expensive, es- pecially when the data is high dimensional. Python's open source machine learning library Scikit-learn uses k-d trees and ball trees to implement nearest neighbor graph construc- tion. However, this implementation is ine cient for large datasets. In this work, we focus on exploiting these under- lying tree-based data structures to optimize parallel execu- tion of the nearest neighbor algorithm. We present parallel implementations of nearest neighbor graph construction us- ing such tree structures, with parallelism provided by the OpenMP and the Galois framework. We empirically show that our parallel and exact approach is e cient as well as scalable, compared to the Scikit-learn implementation. We present the rst implementation of k-d trees and ball trees using Galois. Our results show that k-d trees are faster when the number of dimensions is small (2d N); ball trees on the other hand scale well with the number of dimensions. Our implementation of ball trees in Galois has almost linear speedup on a number of datasets irrespective of the size and dimensionality of the data. Fast, exact graph diameter computation with vertex programming http://hdl.handle.net/2117/76371 Fast, exact graph diameter computation with vertex programming Pennycuff, Corey; Weninger, Tim In graph theory the diameter is an important topological metric for understanding size and density of a graph. Unfortunately, the graph diameter is computationally di cult to measure for even moderately-sized graphs, insomuch that approximation algorithms are commonly used instead of exact measurements. In this paper, we present a new algorithm to measure the exact diameter of unweighted graphs using vertex programming, which is easily distributed. We also show the practical performance of the algorithm in comparison to other, widely available algorithms and implementations, as well as the unreliability in accuracy of some pseudo-diameter estimators. 2015-07-29T08:28:55Z Pennycuff, Corey Weninger, Tim In graph theory the diameter is an important topological metric for understanding size and density of a graph. Unfortunately, the graph diameter is computationally di cult to measure for even moderately-sized graphs, insomuch that approximation algorithms are commonly used instead of exact measurements. In this paper, we present a new algorithm to measure the exact diameter of unweighted graphs using vertex programming, which is easily distributed. We also show the practical performance of the algorithm in comparison to other, widely available algorithms and implementations, as well as the unreliability in accuracy of some pseudo-diameter estimators.