Parallel and distributed clustering framework for big spatial data mining

Bendechache, Malika; Tari, A-Kamel; Kechadi, M-Tahar

Bendechache, Malika ORCID: 0000-0003-0069-1860, Tari, A-Kamel and Kechadi, M-Tahar ORCID: 0000-0002-0176-6281 (2019) Parallel and distributed clustering framework for big spatial data mining. International Journal of Parallel, Emergent and Distributed Systems, 34 (6). pp. 671-689. ISSN 1744-5760

Abstract
Metadata
Downloads
Documents
Metrics

[+][-]

Abstract

Clustering techniques are very attractive for identifying and extracting patterns of interests from datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality, heterogeneity, and high complexity of some algorithms. Distributed clustering techniques constitute a very good alternative to the Big Data challenges (e.g., Volume, Variety, Veracity, and Velocity). In this paper, we developed and implemented a Dynamic Parallel and Distributed clustering (DPDC) approach that can analyse Big Data within a reasonable response time and produce accurate results, by using existing and current computing and storage infrastructure, such as cloud computing. The DPDC approach consists of two phases. The first phase is fully parallel and it generates local clusters and the second phase aggregates the local results to obtain global clusters. The aggregation phase is designed in such a way that the final clusters are compact and accurate while the overall process is efficient in time and memory allocation. DPDC was thoroughly tested and compared to well-known clustering algorithms BIRCH and CURE. The results show that the approach not only produces high-quality results but also scales up very well by taking advantage of the Hadoop MapReduce paradigm or any distributed system.

Metadata

Item Type:	Article (Published)
Refereed:	Yes
Uncontrolled Keywords:	Big Data; MapReduce; Hadoop; Spatial data mining; Clustering; Distributed Clustering; Parallel clustering; DBSCAN; Dynamic K-means
Subjects:	Computer Science > Algorithms Computer Science > Computational complexity Computer Science > Machine learning
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > INSIGHT Centre for Data Analytics
Publisher:	Taylor & Francis
Official URL:	http://dx.doi.org/10.1080/17445760.2018.1446210
Copyright Information:	© 2019 Taylor & Francis
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:	Science Foundation Ireland under Grant Number SFI/12/RC/2289.
ID Code:	24626
Deposited On:	16 Jun 2020 16:24 by Malika Bendechache . Last Modified 16 Jun 2020 16:24

Documents

Full text available as:

[thumbnail of Journal_Paper_in_Parallel__Emergent_and_Distributed_Systems.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
2MB

Metrics

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Parallel and distributed clustering framework for big spatial data mining

Altmetric Badge

Dimensions Badge

Downloads