Exploring the impact of training data bias on automatic generation of video captions

Smeaton, Alan F.; Graham, Yvette; McGuinness, Kevin; O'Connor, Noel E.; Quinn, Sean; Arazo Sánchez, Eric

Smeaton, Alan F. ORCID: 0000-0003-1028-8389, Graham, Yvette, McGuinness, Kevin ORCID: 0000-0003-1336-6477, O'Connor, Noel E. ORCID: 0000-0002-4033-9135, Quinn, Sean and Arazo Sánchez, Eric (2018) Exploring the impact of training data bias on automatic generation of video captions. In: 25th International Conference on Multimedia Modeling (MMM2019), 8 - 11 Jan 2019, Thessaloniki, Greece. ISBN 978-3-030-05710-7

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

A major issue in machine learning is availability of training data. While this historically referred to the availability of a sufficient volume of training data, recently this has shifted to the availability of sufficient unbiased training data. In this paper we focus on the effect of training data bias on an emerging multimedia application, the automatic captioning of short video clips. We use subsets of the same training data to generate different models for video captioning using the same machine learning technique and we evaluate the performances of different training data subsets using a well-known video caption benchmark, TRECVid. We train using the MSR-VTT video-caption pairs and we prune this to reduce and make the set of captions describing a video more homogeneously similar, or more diverse, or we prune randomly. We then assess the effectiveness of caption-generating trained with these variations using automatic metrics as well as direct assessment by human assessors. Our findings are preliminary and show that randomly pruning captions from the training data yields the worst performance and that pruning to make the data more homogeneous, or diverse, does improve performance slightly when compared to random. Our work points to the need for more training data, both more video clips but, more importantly, more captions for those videos.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Uncontrolled Keywords:	Video-to-language; Video captioning; Video understanding; Semantic similarity
Subjects:	Computer Science > Artificial intelligence Computer Science > Machine learning Computer Science > Multimedia systems Computer Science > Digital video
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > INSIGHT Centre for Data Analytics
Published in:	Kompatsiaris, Ioannis, Huet, Benoit, Mezaris, Vasileios, Gurrin, Cathal and Cheng, Wen-Huang, (eds.) MMM 2019: MultiMedia Modeling, Proceedings. Lecture Notes in Computer Science book series (LNCS) 11295(1). Springer. ISBN 978-3-030-05710-7
Publisher:	Springer
Official URL:	http://dx.doi.org/10.1007/978-3-030-05710-7_15
Copyright Information:	©2018 Springer
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:	Science Foundation Ireland under grant numbers 12/RC/2289 and 15/SIRG/3283
ID Code:	23512
Deposited On:	01 Jul 2019 10:54 by Sean Quinn . Last Modified 28 Apr 2022 10:27

Documents

Full text available as:

[thumbnail of Smeaton2019_Chapter_ExploringTheImpactOfTrainingDa.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
474kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Exploring the impact of training data bias on automatic generation of video captions

Downloads