In this paper we study the performance of a distributed search engine from a data caching point of view. We compare and combine two different approaches to achieve better hit rates: (a) send the queries to the node which currently has the related data in its local memory (cache-aware load balancing), and (b) send the cached contents to the node where a query is being currently processed (cooperative caching). Furthermore, we study the best scheduling points in the query computation in which they can be reassigned to another node, and how this reassignation should be performed. Our analysis is guided by statistical tools on a real question answering system for several query distributions, which are typically found in query logs.
CitationDominguez, D.; Pérez-Casany, M.; Larriba, J. Cache-aware load balancing vs. cooperative caching for distributed search engines. A: IEEE International Conference on High Performance Computing and Communications. "11th IEEE International Conference on High Performance Computing and Communications". Seül: IEEE Computer Society Publications, 2009, p. 415-423.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder. If you wish to make any use of the work not provided for in the law, please contact: email@example.com