Mostra el registre d'ítem simple

dc.contributor.authorGibert, Karina
dc.contributor.authorIzquierdo Sebastián, Joaquín
dc.contributor.authorSànchez-Marrè, Miquel
dc.contributor.authorHamilton, Serena H.
dc.contributor.authorRodriguez Roda, Ignasi
dc.contributor.authorHolmes, Geoffrey
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Ciències de la Computació
dc.date.accessioned2019-06-04T09:04:03Z
dc.date.available2020-12-01T01:27:48Z
dc.date.issued2018-12
dc.identifier.citationGibert, K. [et al.]. Which method to use? An assessment of data mining methods in Environmental Data Science. "Environmental modelling & software", Desembre 2018, vol. 110, p. 3-27.
dc.identifier.issn1364-8152
dc.identifier.urihttp://hdl.handle.net/2117/133895
dc.description.abstractData Mining (DM) is a fundamental component of the Data Science process. Over recent years a huge library of DM algorithms has been developed to tackle a variety of problems in fields such as medical imaging and traffic analysis. Many DM techniques are far more flexible than more classical numerial simulation or statistical modelling approaches. These could be usefully applied to data-rich environmental problems. Certain techniques such as artificial neural networks, clustering, case-based reasoning or Bayesian networks have been applied in environmental modelling, while other methods, like support vector machines among others, have yet to be taken up on a wide scale. There is greater scope for many lesser known techniques to be applied in environmental research, with the potential to contribute to addressing some of the current open environmental challenges. However, selecting the best DM technique for a given environmental problem is not a simple decision, and there is a lack of guidelines and criteria that helps the data scientist and environmental scientists to ensure effective knowledge extraction from data. This paper provides a broad introduction to the use of DM in Data Science processes for environmental researchers. Data Science contains three main steps (pre-processing, data mining and post-processing). This paper provides a conceptualization of Environmental Systems and a conceptualization of DM methods, which are in the core step of the Data Science process. These two elements define a conceptual framework that is on the basis of a new methodology proposed for relating the characteristics of a given environmental problem with a family of Data Mining methods. The paper provides a general overview and guidelines of DM techniques to a non-expert user, who can decide with this support which is the more suitable technique to solve their problem at hand. The decision is related to the bidimensional relationship between the type of environmental system and the type of DM method. An illustrative two way table containing references for each pair Environmental System-Data Mining method is presented and discussed. Some examples of how the proposed methodology is used to support DM method selection are also presented, and challenges and future trends are identified.
dc.format.extent25 p.
dc.language.isoeng
dc.publisherElsevier
dc.subjectÀrees temàtiques de la UPC::Matemàtiques i estadística::Estadística matemàtica::Mètodes estadístics
dc.subjectÀrees temàtiques de la UPC::Matemàtiques i estadística::Matemàtica aplicada a les ciències
dc.subject.lcshSequences (Mathematics)
dc.subject.lcshComputer science
dc.subject.otherData mining
dc.subject.otherData science
dc.subject.otherMethod selection
dc.subject.otherMultidisciplinarity
dc.subject.otherEnvironmental systems
dc.titleWhich method to use? An assessment of data mining methods in Environmental Data Science
dc.typeArticle
dc.subject.lemacSeqüències (Matemàtica)
dc.subject.lemacInformàtica
dc.contributor.groupUniversitat Politècnica de Catalunya. KEMLG - Grup d'Enginyeria del Coneixement i Aprenentatge Automàtic
dc.identifier.doi10.1016/j.envsoft.2018.09.021
dc.description.peerreviewedPeer Reviewed
dc.subject.amsClassificació AMS::62 Statistics::62L Sequential methods
dc.subject.amsClassificació AMS::68 Computer science::68P Theory of data
dc.relation.publisherversionhttps://www.sciencedirect.com/science/article/pii/S1364815218308715
dc.rights.accessOpen Access
local.identifier.drac23518076
dc.description.versionPostprint (author's final draft)
local.citation.authorGibert, Karina; Izquierdo, J.; Sànchez-Marrè, M.; Hamilton, S.; Rodriguez-Roda, I.; Holmes, G.
local.citation.publicationNameEnvironmental modelling & software
local.citation.volume110
local.citation.startingPage3
local.citation.endingPage27


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple