dc.contributor.author | Gibert, Karina |
dc.contributor.author | Sànchez-Marrè, Miquel |
dc.contributor.author | Izquierdo, Joaquín |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Ciències de la Computació |
dc.date.accessioned | 2018-11-05T09:16:13Z |
dc.date.available | 2018-11-05T09:16:13Z |
dc.date.issued | 2016-12 |
dc.identifier.citation | Gibert, Karina, Sànchez-Marrè, M., Izquierdo, J. A survey on pre-processing techniques: relevant issues in the context of environmental data mining. "AI communications: the european journal of artificial intelligence", Desembre 2016, vol. 29, núm. 6, p. 627-663. |
dc.identifier.issn | 0921-7126 |
dc.identifier.uri | http://hdl.handle.net/2117/123530 |
dc.description.abstract | One of the important issues related with all types of data analysis, either statistical data analysis, machine learning, data mining, data science or whatever form of data-driven modeling, is data quality. The more complex the reality to be analyzed is, the higher the risk of getting low quality data. Unfortunately real data often contain noise, uncertainty, errors, redundancies or even irrelevant information. Useless models will be obtained when built over incorrect or incomplete data. As a consequence, the quality of decisions made over these models, also depends on data quality. This is why pre-processing is one of the most critical steps of data analysis in any of its forms. However, pre-processing has not been properly systematized yet, and little research is focused on this. In this paper a survey on most popular pre-processing steps required in environmental data analysis is presented, together with a proposal to systematize it. Rather than providing technical details on specific pre-processing techniques, the paper focus on providing general ideas to a non-expert user, who, after reading them, can decide which one is the more suitable technique required to solve his/her problem. |
dc.format.extent | 37 p. |
dc.language.iso | eng |
dc.publisher | IOS Press |
dc.subject | Àrees temàtiques de la UPC::Matemàtiques i estadística::Matemàtica aplicada a les ciències |
dc.subject | Àrees temàtiques de la UPC::Matemàtiques i estadística::Estadística matemàtica |
dc.subject | Àrees temàtiques de la UPC::Matemàtiques i estadística::Anàlisi numèrica |
dc.subject.lcsh | Artificial intelligence |
dc.subject.lcsh | Survival analysis (Biometry) |
dc.subject.lcsh | Numerical analysis--Simulation methods |
dc.subject.other | Pre-processing |
dc.subject.other | data quality |
dc.subject.other | data mining |
dc.subject.other | knowledge discovery from databases |
dc.subject.other | multidisciplinary approach |
dc.subject.other | environmental systems |
dc.title | A survey on pre-processing techniques: relevant issues in the context of environmental data mining |
dc.type | Article |
dc.subject.lemac | Intel·ligència artificial |
dc.subject.lemac | Anàlisi de supervivència (Biometria) |
dc.subject.lemac | Anàlisi numèrica |
dc.contributor.group | Universitat Politècnica de Catalunya. KEMLG - Grup d'Enginyeria del Coneixement i Aprenentatge Automàtic |
dc.identifier.doi | 10.3233/AIC-160710 |
dc.description.peerreviewed | Peer Reviewed |
dc.subject.ams | Classificació AMS::68 Computer science::68T Artificial intelligence |
dc.subject.ams | Classificació AMS::62 Statistics::62N Survival analysis and censored data |
dc.subject.ams | Classificació AMS::65 Numerical analysis::65C Probabilistic methods, simulation and stochastic differential equations |
dc.relation.publisherversion | http://content.iospress.com/articles/ai-communications/aic710 |
dc.rights.access | Open Access |
local.identifier.drac | 19259942 |
dc.description.version | Postprint (author's final draft) |
local.citation.author | Gibert, Karina; Sànchez-Marrè, M.; Izquierdo, J. |
local.citation.publicationName | AI communications: the european journal of artificial intelligence |
local.citation.volume | 29 |
local.citation.number | 6 |
local.citation.startingPage | 627 |
local.citation.endingPage | 663 |