Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Reusing dynamic data marts for query management in an on-demand ETL architecture

McCarthy, Suzanne (2021) Reusing dynamic data marts for query management in an on-demand ETL architecture. PhD thesis, Dublin City University.

Abstract
Data analysts working often have a requirement to integrate an in-house data warehouse with external datasets, especially web-based datasets. Doing so can give them important insights into their performance when compared with competitors, their industry in general on a global scale, and make predictions as to sales, providing important decision support services. The quality of these insights depends on the quality of the data imported into the analysis dataset. There is a wealth of data freely available from government sources online but little unity between data sources, leading to a requirement for a data processing layer wherein various types of quality issues and heterogeneities can be resolved. Traditionally, this is achieved with an Extract-Transform-Load (ETL) series of processes which are performed on all of the available data, in advance, in a batch process typically run outside of business hours. While this is recognized as a powerful knowledge-based support, it is very expensive to build and maintain, and is very costly to update, in the event that new data sources become available. On-demand ETL offers a solution in that data is only acquired when needed and new sources can be added as they come online. However, this form of dynamic ETL is very difficult to deliver. In this research dissertation, we explore the possibilities of creating dynamic data marts which can be created using non-warehouse data to support the inclusion of new sources. We then examine how these dynamic structures can be used for query fulfillment andhow they can support an overall on-demand query mechanism. At each step of the research and development, we employ a robust validation using a real-world data warehouse from the agricultural domain with selected Agri web sources to test the dynamic elements of the proposed architecture.
Metadata
Item Type:Thesis (PhD)
Date of Award:March 2021
Refereed:No
Supervisor(s):Roantree, Mark and McCarren, Andrew
Uncontrolled Keywords:data warehousing; data modelling; data integration
Subjects:Computer Science > Information storage and retrieval systems
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > INSIGHT Centre for Data Analytics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
ID Code:25228
Deposited On:11 Mar 2021 10:55 by Suzanne Mc Carthy . Last Modified 11 Mar 2021 10:55
Documents

Full text available as:

[thumbnail of _Suzanne__PhD_dissertation_submission_version.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
3MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record