Munetsi, Ruvimbo Maud
ORCID: 0009-0004-9458-8382, Mukande, Tendai
ORCID: 0000-0002-0654-7141 and O'Connor, Noel E.
ORCID: 0000-0002-4033-9135
(2026)
Morphology-Aware Retrieval for Low-Resource Environments: Advancing Information Retrieval for Shona Language.
In: 49th International ACM SIGIR Conference on Research and Development in Information Retrieval, 20-24 July 2026, Melbourne, Australia.
ISBN 979-8-4007-2599-9/2026/07
(In Press)
Abstract
Research on Information Retrieval (IR) has historically prioritised high-resource languages such as English and Chinese, with less attention given to many low-resource languages. For example, Shona, a Bantu language spoken by approximately 12 million people in Zimbabwe and neighbouring countries, remains under-explored in IR research despite its widespread societal use in Southern Africa. In this work, we present a preliminary study of Shona IR using sparse and dense retrieval models, demonstrating significant performance limitations due to morphological complexity and data scarcity. Based on these findings, we propose to develop a framework to advance Shona IR by developing a large-scale benchmark dataset to support morphology-aware retrieval. We hypothesise that improving Shona IR supports equitable access to digital information and enables language-inclusive AI technologies aligned with global development priorities such as accessibility to education, dissemination of healthcare information, and digital inclusion.
Metadata
| Item Type: | Conference or Workshop Item (Paper) |
|---|---|
| Event Type: | Conference |
| Refereed: | Yes |
| Uncontrolled Keywords: | Information Retrieval, Low-Resource Languages, Shona, Morphology-Aware Retrieval, Benchmark Datasets |
| Subjects: | Computer Science > Artificial intelligence Computer Science > Machine learning Computer Science > Machine translating |
| DCU Faculties and Centres: | Research Institutes and Centres > INSIGHT Centre for Data Analytics |
| Published in: | Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '26), July 20--24, 2026, Melbourne, VIC, Australia. . ACM. ISBN 979-8-4007-2599-9/2026/07 |
| Publisher: | ACM |
| Official URL: | https://sigir2026.org/en-AU |
| Copyright Information: | Authors |
| Funders: | Research Ireland under Grant Number SFI/12/RC/2289 P2 (Insight Research Ireland Centre for Data Analytics) |
| ID Code: | 32643 |
| Deposited On: | 18 May 2026 09:28 by Tendai Mukande . Last Modified 18 May 2026 09:28 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0 452kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record