Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/40223

Title: Cleenex: Support for User Involvement During an Iterative Data Cleaning Process
Authors: L. M. Pereira, João
Fonseca, Manuel J.
Lopes, Antónia
Galhardas, Helena
Editors: Demartini, Gianluca
Sadiq, Shazia
Yang, Jie
Keywords: Data quality
data curation
user involvement
human-in-the-loop
Issue Date: 15-Feb-2024
Publisher: Association for Computing Machinery
Citation: Pereira, J. L. M., Fonseca, M. J., Lopes, A., & Galhardas, H. (2024). Cleenex: Support for User Involvement during an Iterative Data Cleaning Process. Journal of Data and Information Quality, 16(1), Artigo 6. https://doi.org/10.1145/3648476
Abstract: The existence of large amounts of data increases the probability of occurring data quality problems. A data cleaning process that corrects these problems is usually an iterative process because it may need to be re-executed and refined to produce high quality data. Moreover, due to the specificity of some data quality problems and the limitation of data cleaning programs to cover all problems, often a user has to be involved during the program executions by manually repairing data. However, there is no data cleaning framework that appropriately supports this involvement in such an iterative process, a form of human-in-the-loop, to clean structured data. Moreover, data preparation tools that somehow involve the user in data cleaning processes have not been evaluated with real users to assess their effort. Therefore, we propose Cleenex, a data cleaning framework with support for user involvement during an iterative data cleaning process and conducted two data cleaning experimental evaluations: an assessment of the Cleenex components that support the user when manually repairing data with a simulated user, and a comparison, in terms of user involvement, of data preparation tools with real users. Results show that Cleenex components reduce the user effort when manually cleaning data during a data cleaning process, for example the number of tuples visualized is reduced in 99%. Moreover, when performing data cleaning tasks with Cleenex, real users need less time/effort (e.g., half the clicks) and, based on questionnaires, prefer it to the other tools used for comparison, OpenRefine and Pentaho Data Integration
URI: https://dl.acm.org/doi/10.1145/3648476
http://hdl.handle.net/10174/40223
ISSN: 1936-1955
Type: article
Appears in Collections:INF - Publicações - Artigos em Revistas Internacionais Com Arbitragem Científica

Files in This Item:

File Description SizeFormat
3648476.pdf3.55 MBAdobe PDFView/Open
FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Dspace Dspace
DSpace Software, version 1.6.2 Copyright © 2002-2008 MIT and Hewlett-Packard - Feedback
UEvora B-On Curriculum DeGois