Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

T-EAGLE: Capturing Temporal Narratives via Sequence Captioning and Text Matching

Nguyen-Ho, Thang-Long, Tran, Ly-Duyen orcid logoORCID: 0000-0002-9597-1832, Minh-Triet, Tran, Gurrin, Cathal orcid logoORCID: 0000-0003-2903-3968 and Healy, Graham orcid logoORCID: 0000-0001-6429-6339 (2025) T-EAGLE: Capturing Temporal Narratives via Sequence Captioning and Text Matching. LSC '25: Proceedings of the 8th Annual ACM Workshop on the Lifelog Search Challenge . pp. 23-27. ISSN 979-8-4007-1857-1

Abstract
There is a growing need to retrieve specific events or information from personal lifelog data, but this is particularly challenging due to the massive scale and the passive nature of data capture by lifelogging devices. Current systems typically rely on image similarity for single, isolated images, which struggle to capture the user intent expressed in natural language and the semantic links between the images and activities occurring over time. To address this issue, we propose a novel lifelog retrieval framework that explicitly combines both visual and temporal similarity in a multi-stage process, shifting the focus from single images to coherent sequences of actions. Our approach uses image embeddings to initialize a set of candidate images. Importantly, the system then re-evaluates the query similarity based on action descriptions which contain temporal information across image sequences. Action captioning, integrated into the indexing process, captures richer temporal and semantic context, allowing the system to distinguish between visually similar but semantically distinct events. Additionally, the system incorporates an evidence-based question answering mechanism, in which the narratives of the retrieved sequences provide contextual grounding for the answering model. The paper proposes a hybrid retrieval framework that combines image similarity for candidate initialization and visual-textual similarity for event retrieval. The integration of action descriptions enables language-based temporal representation of events. These are extracted offline through semantic content analysis and serve as the basis for building an evidence-based Question Answering module using these narratives as context. This approach helps bridge the gap between user intent and the multimodal, temporally structured nature of lifelog data.
Metadata
Item Type:Article (Published)
Refereed:Yes
Uncontrolled Keywords:Lifelog, interactive retrieval systems, semantic embedding
Subjects:Computer Science > Computational complexity
Computer Science > Computer engineering
Computer Science > Computer networks
Computer Science > Computer software
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Publisher:Association for Computing Machinery
Official URL:https://dl.acm.org/doi/10.1145/3729459.3748691
Copyright Information:Authors
ID Code:31772
Deposited On:04 Nov 2025 15:38 by Gordon Kennedy . Last Modified 04 Nov 2025 15:38
Documents

Full text available as:

[thumbnail of 3729459.3748691.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0
4MB
Metrics

Altmetric Badge

Dimensions Badge

Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record