McCarthy, Suzanne, McCarren, Andrew ORCID: 0000-0002-7297-0984 and Roantree, Mark (2019) A method for automated transformation and validation of online datasets. In: 23rd IEEE-EDOC Conference: The Enterprise Computing Conference, 28 -31 Oct 2019, Paris, France.
Abstract
While using online datasets for machine learning
is commonplace today, the quality of these datasets impacts
on the performance of prediction algorithms. One method for
improving the semantics of new data sources is to map these
sources to a common data model or ontology. While semantic
and structural heterogeneities must still be resolved, this provides
a well established approach to providing clean datasets, suitable
for machine learning and analysis. However, when there is a
requirement for a close to real time usage of online data, a
method for dynamic Extract-Transform-Load of new sources
data must be developed. In this work, we present a framework for
integrating online and enterprise data sources, in close to real
time, to provide datasets for machine learning and predictive
algorithms. An exhaustive evaluation compares a human built
data transformation process with our system’s machine generated
ETL process, with very favourable results, illustrating the value
and impact of an automated approach.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Uncontrolled Keywords: | Data Engineering; ETL; Data Transformation |
Subjects: | UNSPECIFIED |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > INSIGHT Centre for Data Analytics |
Copyright Information: | © 2019 The Authors |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
ID Code: | 23636 |
Deposited On: | 11 Nov 2019 14:17 by Suzanne Mc Carthy . Last Modified 13 Dec 2019 13:40 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
267kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record