Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Load-balancing distributed outer joins through operator decomposition

Cheng, Long orcid logoORCID: 0000-0003-1638-059X, Kotoulas, Spyros orcid logoORCID: 0000-0003-4754-6433, Liu, Qingzhi and Wang, Ying (2019) Load-balancing distributed outer joins through operator decomposition. Journal of Parallel and Distributed Computing, 132 . pp. 21-35. ISSN 0743-7315

Abstract
High-performance data analytics largely relies on being able to efficiently execute various distributed data operators such as distributed joins. So far, large amounts of join methods have been proposed and evaluated in parallel and distributed environments. However, most of them focus on inner joins, and there is little published work providing the detailed implementations and analysis of outer joins. In this work, we present POPI (Partial Outer join & Partial Inner join), a novel method to load-balance large parallel outer joins by decomposing them into two operations: a large outer join over data that does not present significant skew in the input and an inner join over data presenting significant skew. We present the detailed implementation of our approach and show that POPI is implementable over a variety of architectures and underlying join implementations. Moreover, our experimental evaluation over a distributed memory platform also demonstrates that the proposed method is able to improve outer join performance under varying data skew and present excellent load-balancing properties, compared to current approaches.
Metadata
Item Type:Article (Published)
Refereed:Yes
Uncontrolled Keywords:Distributed join; Outer join; Data skew; Load balancing; Spark
Subjects:Computer Science > Computer engineering
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Publisher:Elsevier
Official URL:http://dx.doi.org/10.1016/j.jpdc.2019.05.008
Copyright Information:© 2019 Elsevier
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 799066.
ID Code:24292
Deposited On:20 Mar 2020 12:50 by Long Cheng . Last Modified 14 May 2021 03:30
Documents

Full text available as:

[thumbnail of popi.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
251kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record