Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

A RAG Approach for Multi-Modal Open-ended Lifelog Question-Answering

Tran, Quang-Linh orcid logoORCID: 0000-0002-5409-0916, Pham, Ngo Ngoc Diep orcid logoORCID: 0009-0000-4453-7246, Truong, Quoc Trung orcid logoORCID: 0009-0009-7868-1401, Nguyen, Minh Hung orcid logoORCID: 0009-0006-1870-1528, Le, Hong Cat orcid logoORCID: 0009-0000-7839-5709, Vu, Dang Khoi orcid logoORCID: 0009-0002-1040-970X, Nguyen, Van Minh Thien orcid logoORCID: 0009-0008-8325-1222, Nguyen, Van Kinh orcid logoORCID: 0009-0000-1617-0520, Nguyen, Luu Phuong Ngoc Lam orcid logoORCID: 0009-0007-2208-2640, Le, Tan orcid logoORCID: 0009-0009-0739-7075, Dang, Minh Phuc orcid logoORCID: 0009-0003-5114-0580, Nguyen, Binh orcid logoORCID: 0000-0001-5249-9702, Jones, Gareth J.F. orcid logoORCID: 0000-0003-2923-8365 and Gurrin, Cathal orcid logoORCID: 0000-0003-2903-3968 (2025) A RAG Approach for Multi-Modal Open-ended Lifelog Question-Answering. In: The 15th ACM International Conference on Multimedia Retrieval, 30 June - 03 July, Chicago, USA. ISBN 979-8-4007-1877-9

Abstract
Lifelogging is the passive collection, storage and analysis of daily data through wearable sensors. Question Answering (QA) for lifelog data enables natural language interactions with personal daily life records, providing insights into individual routines and behaviours. While this task has great potential for personal analytics and memory augmentation, progress has been limited due to the challenges of lifelog management, since they can comprise of enormous multi-modal data sets spanning a lifetime. We introduce a Retrieval-Augmented Generation (RAG) approach for addressing the lifelog QA task. A RAG approach first includes a retrieval model finding the correct lifelog events containing answers and then a large language model (LLM) generating answers from the questions. In addition, we construct an open-ended lifelog QA benchmark with 14,187 QA pairs to examine the RAG approach to lifelog QA. Using an embedding-based retrieval approach, our lifelog context retriever achieves a performance of 77.67% Recall@5 and 94.35% Recall@20 using an embedding-based retrieval approach with the Stella 1.5B model. Combined with the Mistral 7B model, the model achieves scores of 39.54% ROUGE-L and 3.475 Accuracy on a scale of 5 scored by GPT-4o. This approach potentially provides an effective approach to lifelog QA with high performance that does not require fine-tuning.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Lifelog Question Answering; Multi-modal Question Answering Dataset; Large Language Models; Retrieval-Augmented Generation
Subjects:Computer Science > Artificial intelligence
Computer Science > Information retrieval
Computer Science > Lifelog
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Published in: ICMR '25: Proceedings of the 2025 International Conference on Multimedia Retrieval. . Association for Computing Machinery, New York, United States. ISBN 979-8-4007-1877-9
Publisher:Association for Computing Machinery, New York, United States
Funders:ADAPT
ID Code:31395
Deposited On:29 Aug 2025 12:53 by Quang-Linh Tran . Last Modified 29 Aug 2025 12:53
Documents

Full text available as:

[thumbnail of 3731715.3733263 (1).pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0
1MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record