Mostra el registre d'ítem simple
Introducing semantic variables in mixed distance measures: Impact on hierarchical clustering
dc.contributor.author | Gibert, Karina |
dc.contributor.author | Valls Mateu, Aïda |
dc.contributor.author | Batet Sanromà, Montserrat |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa |
dc.date.accessioned | 2015-06-30T11:31:34Z |
dc.date.available | 2015-09-30T00:31:02Z |
dc.date.created | 2014-09-01 |
dc.date.issued | 2014-09-01 |
dc.identifier.citation | Gibert, Karina; Valls, A.; Batet, M. Introducing semantic variables in mixed distance measures: Impact on hierarchical clustering. "Knowledge and information systems", 01 Setembre 2014, vol. 40, núm. 3, p. 559-593. |
dc.identifier.issn | 0219-1377 |
dc.identifier.uri | http://hdl.handle.net/2117/28467 |
dc.description.abstract | Today, it is well known that taking into account the semantic information available for categorical variables sensibly improves the meaningfulness of the final results of any analysis. The paper presents a generalization of mixed Gibert's metrics, which originally handled numerical and categorical variables, to include also semantic variables. Semantic variables are defined as categorical variables related to a reference ontology (ontologies are formal structures to model semantic relationships between the concepts of a certain domain). The superconcept-based distance (SCD) is introduced to compare semantic variables taking into account the information provided by the reference ontology. A benchmark shows the good performance of SCD with respect to other proposals, taken from the literature, to compare semantic features. Mixed Gibert's metrics is generalized incorporating SCD. Finally, two real applications based on touristic data show the impact of the generalized Gibert's metrics in clustering procedures and, in consequence, the impact of taking into account the reference ontology in clustering. The main conclusion is that the reference ontology, when available, can sensibly improve the meaningfulness of the final clusters. |
dc.format.extent | 35 p. |
dc.language.iso | eng |
dc.subject | Àrees temàtiques de la UPC::Matemàtiques i estadística::Investigació operativa::Programació matemàtica |
dc.subject.lcsh | Operations research |
dc.subject.other | Clustering |
dc.subject.other | Metrics |
dc.subject.other | Numerical and Categorical variables |
dc.subject.other | Semantic data |
dc.subject.other | Ontology |
dc.subject.other | BACKGROUND KNOWLEDGE |
dc.subject.other | GENE ONTOLOGY |
dc.subject.other | SIMILARITY |
dc.subject.other | WEB |
dc.subject.other | RECOMMENDATIONS |
dc.subject.other | PROFILES |
dc.subject.other | TOURISM |
dc.subject.other | SYSTEMS |
dc.subject.other | METRICS |
dc.subject.other | DOMAIN |
dc.title | Introducing semantic variables in mixed distance measures: Impact on hierarchical clustering |
dc.type | Article |
dc.subject.lemac | Optimització i investigació operativa |
dc.contributor.group | Universitat Politècnica de Catalunya. KEMLG - Grup d'Enginyeria del Coneixement i Aprenentatge Automàtic |
dc.identifier.doi | 10.1007/s10115-013-0663-5 |
dc.description.peerreviewed | Peer Reviewed |
dc.subject.ams | 90B Operations research and management science |
dc.relation.publisherversion | http://link.springer.com/article/10.1007%2Fs10115-013-0663-5 |
dc.rights.access | Restricted access - publisher's policy |
local.identifier.drac | 12985413 |
dc.description.version | Postprint (published version) |
local.citation.author | Gibert, Karina; Valls, A.; Batet, M. |
local.citation.publicationName | Knowledge and information systems |
local.citation.volume | 40 |
local.citation.number | 3 |
local.citation.startingPage | 559 |
local.citation.endingPage | 593 |
Fitxers d'aquest items
Aquest ítem apareix a les col·leccions següents
-
Articles de revista [124]
-
Articles de revista [719]