Automatic error localisation for categorical, continuous and integer data
Visualitza/Obre
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2099/3757
Tipus de documentArticle
Data publicació2005
EditorInstitut d'Estadística de Catalunya
Condicions d'accésAccés obert
Llevat que s'hi indiqui el contrari, els
continguts d'aquesta obra estan subjectes a la llicència de Creative Commons
:
Reconeixement-NoComercial-SenseObraDerivada 2.5 Espanya
Abstract
Data collected by statistical offices generally contain errors, which have to be corrected before reliable data can be published. This correction process is referred to as statistical data editing. At
statistical offices, certain rules, so-called edits, are often used during the editing process to determine whether a record is consistent or not. Inconsistent records are considered to contain errors, while consistent records are considered error-free. In this article we focus on automatic error localisation based on the Fellegi-Holt paradigm, which says that the data should be made to satisfy all edits by
changing the fewest possible number of fields. Adoption of this paradigm leads to a mathematical optimisation problem. We propose an algorithm for solving this optimisation problem for a mix of
categorical, continuous and integer-valued data. We also propose a heuristic procedure based on the exact algorithm. For five realistic data sets involving only integer-valued variables we evaluate the
performance of this heuristic procedure.
CitacióWaal, Ton de. "Automatic error localisation for categorical, continuous and integer data". SORT, 2005, Vol. 29, núm. 1
ISSN1696-2281
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
article.pdf | 241,6Kb | Visualitza/Obre |