T-EAGLE: Capturing Temporal Narratives via Sequence
Captioning and Text Matching

Nguyen-Ho, Thang-Long; Tran, Ly-Duyen; Minh-Triet, Tran; Gurrin, Cathal; Healy, Graham

Nguyen-Ho, Thang-Long, Tran, Ly-Duyen ORCID: 0000-0002-9597-1832, Minh-Triet, Tran, Gurrin, Cathal ORCID: 0000-0003-2903-3968 and Healy, Graham ORCID: 0000-0001-6429-6339 (2025) T-EAGLE: Capturing Temporal Narratives via Sequence Captioning and Text Matching. LSC '25: Proceedings of the 8th Annual ACM Workshop on the Lifelog Search Challenge . pp. 23-27. ISSN 979-8-4007-1857-1

Abstract
Metadata
Downloads
Documents
Metrics

[+][-]

Abstract

There is a growing need to retrieve specific events or information from personal lifelog data, but this is particularly challenging due to the massive scale and the passive nature of data capture by lifelogging devices. Current systems typically rely on image similarity for single, isolated images, which struggle to capture the user intent expressed in natural language and the semantic links between the images and activities occurring over time. To address this issue, we propose a novel lifelog retrieval framework that explicitly combines both visual and temporal similarity in a multi-stage process, shifting the focus from single images to coherent sequences of actions. Our approach uses image embeddings to initialize a set of candidate images. Importantly, the system then re-evaluates the query similarity based on action descriptions which contain temporal information across image sequences. Action captioning, integrated into the indexing process, captures richer temporal and semantic context, allowing the system to distinguish between visually similar but semantically distinct events. Additionally, the system incorporates an evidence-based question answering mechanism, in which the narratives of the retrieved sequences provide contextual grounding for the answering model. The paper proposes a hybrid retrieval framework that combines image similarity for candidate initialization and visual-textual similarity for event retrieval. The integration of action descriptions enables language-based temporal representation of events. These are extracted offline through semantic content analysis and serve as the basis for building an evidence-based Question Answering module using these narratives as context. This approach helps bridge the gap between user intent and the multimodal, temporally structured nature of lifelog data.

Metadata

Item Type:	Article (Published)
Refereed:	Yes
Uncontrolled Keywords:	Lifelog, interactive retrieval systems, semantic embedding
Subjects:	Computer Science > Computational complexity Computer Science > Computer engineering Computer Science > Computer networks Computer Science > Computer software
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Publisher:	Association for Computing Machinery
Official URL:	https://dl.acm.org/doi/10.1145/3729459.3748691
Copyright Information:	Authors
ID Code:	31772
Deposited On:	04 Nov 2025 15:38 by Gordon Kennedy . Last Modified 04 Nov 2025 15:38

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0
4MB

Metrics

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

T-EAGLE: Capturing Temporal Narratives via Sequence Captioning and Text Matching

Altmetric Badge

Dimensions Badge

Downloads