Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Data quality problems in TPC-DI based data integration processes

Yang, Qishan, Ge, Mouzhi and Helfert, Markus orcid logoORCID: 0000-0001-6546-6408 (2018) Data quality problems in TPC-DI based data integration processes. In: 19th International Conference, ICEIS 2017, 26-29 Apr 2017, Porto, Portugal. ISBN 978-3-319-93374-0

Abstract
Many data driven organisations need to integrate data from multiple, distributed and heterogeneous resources for advanced data analysis. A data integration system is an essential component to collect data into a data warehouse or other data analytics systems. There are various alternatives of data integration systems which are created in-house or provided by vendors. Hence, it is necessary for an organisation to compare and benchmark them when choosing a suitable one to meet its requirements. Recently, the TPC-DI is proposed as the first industrial benchmark for evaluating data integration systems. When using this benchmark, we find some typical data quality problems in the TPC-DI data source such as multi-meaning attributes and inconsistent data schemas, which could delay or even fail the data integration process. This paper explains processes of this benchmark and summarises typical data quality problems identified in the TPC-DI data source. Furthermore, in order to prevent data quality problems and proactively manage data quality, we propose a set of practical guidelines for researchers and practitioners to conduct data quality management when using the TPC-DI benchmark.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Data quality; Data integration; TPC-DI Benchmark; ETL
Subjects:Computer Science > Information technology
Computer Science > Information storage and retrieval systems
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > INSIGHT Centre for Data Analytics
Research Institutes and Centres > ADAPT
Published in: Hammoudi, Slimane, Śmiałek, Michał, Camp, Olivier and Filipe, Joaquim, (eds.) Enterprise Information Systems. Lecture Notes in Business Information Processing (LNBIP) 321. Springer. ISBN 978-3-319-93374-0
Publisher:Springer
Official URL:https://doi.org/10.1007/978-3-319-93375-7_4
Copyright Information:© 2018 Springer. The original publication is available at www.springerlink.com
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland grant SFI/12/RC/2289
ID Code:22315
Deposited On:26 Jun 2018 12:51 by Qishan Yang . Last Modified 13 Mar 2019 14:41
Documents

Full text available as:

[thumbnail of 2018_Data_Quality_Problems_in_TPC-DI_Based_Data_Integration_Processes.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
759kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record