Nguyen-Ho, Thang-Long, Tran, Ly-Duyen
ORCID: 0000-0002-9597-1832, Minh-Triet, Tran, Gurrin, Cathal
ORCID: 0000-0003-2903-3968 and Healy, Graham
ORCID: 0000-0001-6429-6339
(2025)
T-EAGLE: Capturing Temporal Narratives via Sequence
Captioning and Text Matching.
LSC '25: Proceedings of the 8th Annual ACM Workshop on the Lifelog Search Challenge
.
pp. 23-27.
ISSN 979-8-4007-1857-1
Abstract
There is a growing need to retrieve specific events or information
from personal lifelog data, but this is particularly challenging due
to the massive scale and the passive nature of data capture by lifelogging devices. Current systems typically rely on image similarity
for single, isolated images, which struggle to capture the user intent
expressed in natural language and the semantic links between the
images and activities occurring over time. To address this issue, we
propose a novel lifelog retrieval framework that explicitly combines
both visual and temporal similarity in a multi-stage process, shifting the focus from single images to coherent sequences of actions.
Our approach uses image embeddings to initialize a set of candidate images. Importantly, the system then re-evaluates the query
similarity based on action descriptions which contain temporal
information across image sequences. Action captioning, integrated
into the indexing process, captures richer temporal and semantic context, allowing the system to distinguish between visually
similar but semantically distinct events. Additionally, the system
incorporates an evidence-based question answering mechanism, in
which the narratives of the retrieved sequences provide contextual
grounding for the answering model. The paper proposes a hybrid
retrieval framework that combines image similarity for candidate
initialization and visual-textual similarity for event retrieval. The
integration of action descriptions enables language-based temporal representation of events. These are extracted offline through
semantic content analysis and serve as the basis for building an
evidence-based Question Answering module using these narratives
as context. This approach helps bridge the gap between user intent
and the multimodal, temporally structured nature of lifelog data.
Metadata
| Item Type: | Article (Published) |
|---|---|
| Refereed: | Yes |
| Uncontrolled Keywords: | Lifelog, interactive retrieval systems, semantic embedding |
| Subjects: | Computer Science > Computational complexity Computer Science > Computer engineering Computer Science > Computer networks Computer Science > Computer software |
| DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
| Publisher: | Association for Computing Machinery |
| Official URL: | https://dl.acm.org/doi/10.1145/3729459.3748691 |
| Copyright Information: | Authors |
| ID Code: | 31772 |
| Deposited On: | 04 Nov 2025 15:38 by Gordon Kennedy . Last Modified 04 Nov 2025 15:38 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0 4MB |
Metrics
Altmetric Badge
Dimensions Badge
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record