Specification and Implementation of Aggregation in Graph Query Languages
Document typeMaster thesis (pre-Bologna period)
Rights accessRestricted access - confidentiality agreement
Nowadays, the analysis of large amounts of data has become an important task for information extraction. For this purpose, the computation of aggregate functions is required. Although mature approaches exist for conventional relational data, aggregation on graph-based data is still a young field. Moreover, many fields such as logistics or bioinformatics model their data as large graphs. To extract a richer knowledge of these stored data, the user must be able to retrieve information about the graph structure. In the particular case from logistics, edges and nodes contain also a large amount of attributes which must be considered in the search of a graph pattern. Considering these information needs, graph query languages have been developed. Large graphs must be accessed with efficient techniques, too. This can be achieved by using relational databases, but SQL loses its efficiency and declarative feature when querying graphs. In this thesis, we want to change this by unifying the properties of a declarative graph query language with the efficiency of relational databases. Thus, we want to achieve a declarative way to formulate aggregate functions over graph structures. To do this, we present firstly a list of requirements for an appropriate graph query language in a logistics scenario and an accurate analysis of the existing graph query languages. After one graph query language is chosen due to its uniform data model and declarative language, we design and implement an interpreter which maps it into relational databases. Taking into account the constraints defined in a query, we finally propose optimization techniques for the efficient retrieval of the searched graph pattern using recursion available in SQL-99. We evaluate our approach on both synthetic and real data. Results demonstrate that the applied optimizations provide a better performance of our interpreter and relational databases outperform the Prologbased implementation of the chosen graph query language by obtaining better execution times in large graphs.
Projecte realitzat mitjançant programa de mobilitat. Karlsruhe Institute of Technology. Department of Informatics Institute for Program Structures and Data Organization (IPD)