Link prediction in very large directed graphs: Exploiting hierarchical properties in parallel
Document typeConference report
Rights accessOpen Access
Link prediction is a link mining task that tries to find new edges within a given graph. Among the targets of link prediction there is large directed graphs, which are frequent structures nowadays. The typical sparsity of large graphs demands of high precision predictions in order to obtain usable results. However, the size of those graphs only permits the execution of scalable algorithms. As a trade-off between those two problems we recently proposed a link prediction algorithm for directed graphs that exploits hierarchical properties. The algorithm can be classified as a local score, which entails scalability. Unlike the rest of local scores, our proposal assumes the existence of an underlying model for the data which allows it to produce predictions with a higher precision. We test the validity of its hierarchical assumptions on two clearly hierarchical data sets, one of them based on RDF. Then we test it on a non-hierarchical data set based on Wikipedia to demonstrate its broad applicability. Given the computational complexity of link prediction in very large graphs we also introduce some general recommendations useful to make of link prediction an efficiently parallelized problem.
CitationGarcía-Gasulla, D.; Cortés, C. Link prediction in very large directed graphs: Exploiting hierarchical properties in parallel. A: International Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data. "Proceedings of the 3rd Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data co-located with 11th Extended Semantic Web Conference (ESWC 2014): Crete, Greece, May 25, 2014". Creta: CEUR-WS.org, 2014, p. 1-13.