Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Morphology-Aware Retrieval for Low-Resource Environments: Advancing Information Retrieval for Shona Language

Munetsi, Ruvimbo Maud orcid logoORCID: 0009-0004-9458-8382, Mukande, Tendai orcid logoORCID: 0000-0002-0654-7141 and O'Connor, Noel E. orcid logoORCID: 0000-0002-4033-9135 (2026) Morphology-Aware Retrieval for Low-Resource Environments: Advancing Information Retrieval for Shona Language. In: 49th International ACM SIGIR Conference on Research and Development in Information Retrieval, 20-24 July 2026, Melbourne, Australia. ISBN 979-8-4007-2599-9/2026/07 (In Press)

Abstract
Research on Information Retrieval (IR) has historically prioritised high-resource languages such as English and Chinese, with less attention given to many low-resource languages. For example, Shona, a Bantu language spoken by approximately 12 million people in Zimbabwe and neighbouring countries, remains under-explored in IR research despite its widespread societal use in Southern Africa. In this work, we present a preliminary study of Shona IR using sparse and dense retrieval models, demonstrating significant performance limitations due to morphological complexity and data scarcity. Based on these findings, we propose to develop a framework to advance Shona IR by developing a large-scale benchmark dataset to support morphology-aware retrieval. We hypothesise that improving Shona IR supports equitable access to digital information and enables language-inclusive AI technologies aligned with global development priorities such as accessibility to education, dissemination of healthcare information, and digital inclusion.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Information Retrieval, Low-Resource Languages, Shona, Morphology-Aware Retrieval, Benchmark Datasets
Subjects:Computer Science > Artificial intelligence
Computer Science > Machine learning
Computer Science > Machine translating
DCU Faculties and Centres:Research Institutes and Centres > INSIGHT Centre for Data Analytics
Published in: Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '26), July 20--24, 2026, Melbourne, VIC, Australia. . ACM. ISBN 979-8-4007-2599-9/2026/07
Publisher:ACM
Official URL:https://sigir2026.org/en-AU
Copyright Information:Authors
Funders:Research Ireland under Grant Number SFI/12/RC/2289 P2 (Insight Research Ireland Centre for Data Analytics)
ID Code:32643
Deposited On:18 May 2026 09:28 by Tendai Mukande . Last Modified 18 May 2026 09:28
Documents

Full text available as:

[thumbnail of lre286 (3).pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0
452kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record