MPEG-7 is a generic standard used to encode information about multimedia content and often, different MPEG-7 Descriptor Schemas are instantiated for different representations of a shot such as text annotations and visual features. Our work focuses on two mam areas, the first is devising a method for combining text annotations and visual features into one single MPEG-7 description and the second is defining how best to carry out text and non-text queries for retrieval via a combined description.
We align the video retneval process to a text retrieval process based on the TF*IDF vector space model via clustering of low-level visual features. Our assumption is that shots within the same cluster are not only similar visually but also semantically, to certain extent. Our method maps the visual features of each shot onto a term weight vector via clustering. This vector is then combined with the original text features of the shot (1 e ASR transcripts) to produce the final searchable index.
Our TRECVID2002 and TRECVID2003 experiments show that adding extra meaning to a shot based on the shots from the same cluster is useful when each video in the collection contains a high proportion of similar shots, for example in documentaries. Adding meaning to a shot based on the shots that are around it might an effective method for video retneval when each video m the collection has low proportion of similar shots such as TV news programmes.
Metadata
Item Type:
Thesis (PhD)
Date of Award:
2004
Refereed:
No
Supervisor(s):
Smeaton, Alan F.
Uncontrolled Keywords:
Video compression; MPEG (Video coding standard); Video retrieval