Video content can be automatically analysed and indexed using trained classifiers which map low-level features to semantic concepts. Such classifiers need training data consisting of sets of images which contain such concepts and recently it has been discovered that such training data can be located using text-based search to image databases on the internet. Formulating the text queries which locate these training images is the challenge we address here. In this paper we present preliminary results on TRECVid data of concept classification using automatically crawled images as training data and we compare the results with those obtained from manually annotated training sets.
The 3rd Annual Meeting Of The EPSRC Network On Vision & Language and The 1st Technical Meeting of the European Network on Integrating Vision and Language A Workshop of the 25th International Conference on Computational Linguistics (COLING 2014).
.
Association for Computational Linguistics. ISBN 978-1-873769-28-1