Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

An automated ETL for online datasets

McCarthy, Suzanne, McCarren, Andrew orcid logoORCID: 0000-0002-7297-0984 and Roantree, Mark (2019) An automated ETL for online datasets. In: 23rd Enterprise Computing Conference (EDOC), 28-31 Oct 2019, Paris, France.

Abstract
While using online datasets for machine learning is commonplace today, the quality of these datasets impacts on the performance of prediction algorithms. One method for improving the semantics of new data sources is to map these sources to a common data model or ontology. While semantic and structural heterogeneities must still be resolved, this provides a well established approach to providing clean datasets, suitable for machine learning and analysis. However, when there is a requirement for a close to real time usage of online data, a method for dynamic Extract-Transform-Load of new sources data must be developed. In this work, we present a framework for integrating online and enterprise data sources, in close to real time, to provide datasets for machine learning and predictive algorithms. An exhaustive evaluation compares a human built data transformation process with our system’s machine generated ETL process, with very favourable results, illustrating the value and impact of an automated approach.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:ETL; data warehousing; data transformation; data mining; data models
Subjects:Computer Science > Machine learning
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > INSIGHT Centre for Data Analytics
Published in: 2019 IEEE 23rd International Enterprise Distributed Object Computing Conference (EDOC). . IEEE.
Publisher:IEEE
Official URL:http://dx.doi.org/10.1109/EDOC.2019.00030
Copyright Information:© 2019 The Authors
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
ID Code:23538
Deposited On:14 May 2020 13:58 by Suzanne Mc Carthy . Last Modified 14 May 2020 13:58
Documents

Full text available as:

[thumbnail of _Suzanne__EDOC_long.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
308kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record