Yang, Qishan and Helfert, Markus ORCID: 0000-0001-6546-6408 (2016) Data quality for web log data using a Hadoop environment. In: 21st ICIQ 2016, 22-23 Jun 2016, Ciudad Real, Spain.
Abstract
Solving data quality problems is important for data warehouse construction and operation. This paper is based on developing a web log warehouse. It proposes a data quality problem methodology for data preprocessing within the log warehouse. It provides a hierarchical data warehouse architecture that is suitable for resource saving and ad hoc requirements. The data preprocessing is completed using Hadoop associated with its sub-projects such as Hive, HBase etc. In this paper we compare a Hadoop setup with a Oracle based architecture.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Uncontrolled Keywords: | Hadoop |
Subjects: | Computer Science > Information storage and retrieval systems |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT |
Published in: | Data Quality for Web Log Data Using a Hadoop Environment. (21). |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
ID Code: | 21297 |
Deposited On: | 29 Jul 2016 10:19 by Qishan Yang . Last Modified 13 Mar 2019 14:47 |
Documents
Full text available as:
Preview |
PDF (Data Quality for Web Log Data Using a Hadoop Environment)
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
527kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record