An efficient design for a distributed filesystem originates from a deep understanding of common access patterns and
user behavior which is obtained through a deep analysis of traces and snapshots. In this paper we analyze traces for eight distributed filesystems that represent a mix of workloads taken from educational, research and commercial environments. We focused on characterizing block access patterns, amount of block sharing and working set size over long periods of time, and we tried to find common behaviors for all workloads that can be generalized to other storage systems. We found that most environments shared large amounts of blocks over time, and that block sharing was significantly affected by repetitive human behavior. We also found that block lifetimes tended to be short, but there were significant amounts of blocks with long lifetimes that were accessed over many consecutive days. Lastly, we determined that most daily accesses were made to a reduced set of blocks. We strongly believe that these findings can be used to improve long-term caching policies as well as data placement algorithms, thus increasing the performance of distributed storage systems.
CitationMiranda, A.; Cortés, A. Analyzing long-term access locality to find ways to improve distributed storage systems. A: Euromicro International Conference on Parallel, Distributed, and Network-Based Processing. "Proceedings - 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2012". 2012, p. 544-553.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder. If you wish to make any use of the work not provided for in the law, please contact: email@example.com