Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Data quality for web log data using a Hadoop environment

Yang, Qishan and Helfert, Markus orcid logoORCID: 0000-0001-6546-6408 (2016) Data quality for web log data using a Hadoop environment. In: 21st ICIQ 2016, 22-23 Jun 2016, Ciudad Real, Spain.

Abstract
Solving data quality problems is important for data warehouse construction and operation. This paper is based on developing a web log warehouse. It proposes a data quality problem methodology for data preprocessing within the log warehouse. It provides a hierarchical data warehouse architecture that is suitable for resource saving and ad hoc requirements. The data preprocessing is completed using Hadoop associated with its sub-projects such as Hive, HBase etc. In this paper we compare a Hadoop setup with a Oracle based architecture.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Hadoop
Subjects:Computer Science > Information storage and retrieval systems
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Published in: Data Quality for Web Log Data Using a Hadoop Environment. (21).
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
ID Code:21297
Deposited On:29 Jul 2016 10:19 by Qishan Yang . Last Modified 13 Mar 2019 14:47
Documents

Full text available as:

[thumbnail of Data Quality  for Web Log Data Using a Hadoop  Environment]
Preview
PDF (Data Quality for Web Log Data Using a Hadoop Environment) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
527kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record