Search, as a well-known information retrieval strategy, is widely researched and developed for academic and commercial usage. However, in the context of increasing amounts of multimedia data, search alone cannot satisfy user requirements for exploring multimedia resources. Therefore, preprocessing of multimedia resources is necessary to define potentially related documents to reduce retrieval time and improve the browsing efficiency. Using hyperlinks to connect relevant resources is widely used for multimedia collection. However, the definition of hyperlinks is usually based on textual information. For example, hyperlinks in Wikipedia link a term to relevant webpages. By contrast, content based multimedia retrieval provides the possibility of analysing multimedia materials on the actual content. The availability of these technologies for multimedia search suggests further investigation of content-based hyperlinking for multimedia collections.
This thesis is dedicated to a novel topic of automatically creating hyperlinks within TV data collections for content-based browsing and navigation. Hyperlinks are created between video segments determined to be related based on their multimodal features.
First, we detail the methodologies to create potentially relevant segments across the TV collection in terms of automatically detected spoken information. We present which of these approaches are more efficient to segment video streams.
Next, we involve both low-level and high-level visual features to improve the hyperlinking quality. We detail the implementation of data fusion schemes to combine multimodal features.
Finally, a novel hyperlinking framework associated with query enrichment, spoken data analysis, and multimodal fusion is proposed. The experiments show the effectiveness of this framework at satisfying user experience which is concluded in crowdsourcing study.