Partial symbol ordering distance
Document typeConference report
Rights accessRestricted access - publisher's policy
Nowadays sequences of symbols are becoming more important, as they are the standard format for representing information in a large variety of domains such as ontologies, sequential patterns or non numerical attributes in databases. Therefore, the development of new distances for this kind of data is a crucial need. Recently, many similarity functions have been proposed for managing sequences of symbols; however, such functions do not always hold the triangular inequality. This property is a mandatory requirement in many data mining algorithms like clustering or k-nearest neighbors algorithms, where the presence of a metric space is a must. In this paper, we propose a new distance for sequences of (non-repeated) symbols based on the partial distances between the positions of the common symbols. We prove that this Partial Symbol Ordering distance satisfies the triangular inequality property, and we finally describe a set of experiments supporting that the new distance outperforms the Edit distance in those ecenarios where sequence similarity is related to the positions occupied by the symbols.
CitationHerranz, J.; Nin, J. Partial symbol ordering distance. A: International Conference on Modeling Decisions for Artificial Intelligence. "6th International Conference on Modeling Decisions for Artificial Intelligence". Awaji Isl (Japan): Springer Verlag, 2009, p. 293-302.