DSpace Collection:
http://hdl.handle.net/2117/639
2014-07-25T17:41:12ZGeneralized median string computation by means of string embedding in vector spaces
http://hdl.handle.net/2117/19430
Title: Generalized median string computation by means of string embedding in vector spaces
Authors: Jiang, Xiaoyi; Wentker, Jöran; Ferrer Sumsi, Miquel
Abstract: In structural pattern recognition the median string has been established as a useful tool to represent a set of strings. However, its exact computation is complex and of high computational burden. In this paper we propose a new approach for the computation of median string based on string embedding. Strings are embedded into a vector space and the median is computed in the vector domain. We apply three different inverse transformations to go from the vector domain back to the string domain in order to obtain a final approximation of the median string. All of them are based on the weighted mean of a pair of strings. Experiments show that we succeed to compute good approximations of the median string.2013-05-28T16:06:00ZUsing Evolutive Summary Counters for Efficient Cooperative Caching in Search Engines
http://hdl.handle.net/2117/16552
Title: Using Evolutive Summary Counters for Efficient Cooperative Caching in Search Engines
Authors: Domínguez Sal, David; Aguilar Saborit, Josep; Surdeanu, Mihai; Larriba Pey, Josep
Description: We propose and analyze a distributed cooperative
caching strategy based on the Evolutive Summary Counters
(ESC), a new data structure that stores an approximated record
of the data accesses in each computing node of a search engine.
The ESC capture the frequency of accesses to the elements
of a data collection, and the evolution of the access patterns
for each node in a network of computers. The ESC can be
efficiently summarized into what we call ESC-summaries to
obtain approximate statistics of the document entries accessed
by each computing node.
We use the ESC-summaries to introduce two algorithms that
manage our distributed caching strategy, one for the distribution
of the cache contents, ESC-placement, and another one for the
search of documents in the distributed cache, ESC-search. While
the former improves the hit rate of the system and keeps a large
ratio of data accesses local, the latter reduces the network traffic
by restricting the number of nodes queried to find a document.
We show that our cooperative caching approach outperforms
state of the art models in both hit rate, throughput, and location
recall for multiple scenarios, i.e., different query distributions
and systems with varying degrees of complexity.2012-09-21T09:29:32ZSocial based layouts for the increase of locality in graph operations
http://hdl.handle.net/2117/13533
Title: Social based layouts for the increase of locality in graph operations
Authors: Prat Pérez, Arnau; Domínguez Sal, David; Larriba Pey, Josep
Abstract: Graphs provide a natural data representation for analyzing the relationships among entities in many application areas. Since the
analysis algorithms perform memory intensive operations, it is important that the graph layout is adapted to take advantage of the memory hierarchy.
Here, we propose layout strategies based on community detection to improve the in-memory data locality of generic graph algorithms. We
conclude that the detection of communities in a graph provides a layout strategy that improves the performance of graph algorithms consistently over other state of the art strategies.2011-10-17T11:15:59ZCooperative cache analysis for distributed search engines
http://hdl.handle.net/2117/13116
Title: Cooperative cache analysis for distributed search engines
Authors: Domínguez Sal, David; Pérez Casany, Marta; Larriba Pey, Josep
Abstract: In this paper, we study the performance of a distributed search engine from a data caching point of view using statistical tools on a varied set of configurations. We study two strategies to achieve better performance: cacheaware load balancing that issues the queries to nodes that store the computation in cache; and cooperative caching (CC) that stores and transfers the available computed contents from one node in the network to others. Since cache-aware
decisions depend on information about the recent history, we also analyse how the ageing of this information impacts the system performance. Our results show that the combination of both strategies yield better throughput than individually implementing cooperative cache or cache-aware load balancing strategies because
of a synergic improvement of the hit rate. Furthermore, the analysis concludes that the data structures to monitor the system need only moderate precision to achieve optimal throughput.2011-08-25T10:59:55Z