Constructing data marts from web sources using a graph common model
Scriney, MichaelORCID: 0000-0001-6813-2630
(2018)
Constructing data marts from web sources using a graph common model.
PhD thesis, Dublin City University.
At a time when humans and devices are generating more information than ever, activities such as data mining and machine learning become crucial. These activities enable us to understand and interpret the information we have and predict, or better prepare ourselves for, future events. However, activities such as data mining cannot be performed without a layer of data management to clean, integrate, process and make available the necessary datasets. To that extent, large and costly data flow processes such as Extract-Transform-Load are necessary to extract from disparate information sources to generate ready-for-analyses datasets. These datasets are generally in the form of multi-dimensional cubes from which different data views can be extracted for the purpose of different analyses. The process of creating a multi-dimensional cube from integrated data sources is significant. In this research, we present a methodology to generate these cubes automatically or in some cases, close to automatic, requiring very little user interaction. A construct called a StarGraph acts as a canonical model for our system, to which imported data sources are transformed. An ontology-driven process controls the integration of StarGraph schemas and simple OLAP style functions generate the cubes or datasets. An extensive evaluation is carried out using a large number of agri data sources with user-defined case studies to identify sources for integration and the types of analyses required for the final data cubes.
Metadata
Item Type:
Thesis (PhD)
Date of Award:
November 2018
Refereed:
No
Supervisor(s):
Roantree, Mark
Uncontrolled Keywords:
Data Analytics; Data Warehousing; Data Integration; ETL