The searching spontaneous speech can be enhanced by combining automatic speech transcriptions with semantically
related metadata. An important question is what can be expected from search of such transcriptions and different
sources of related metadata in terms of retrieval effectiveness. The Cross-Language Speech Retrieval (CL-SR) track at recent CLEF workshops provides a spontaneous speech
test collection with manual and automatically derived metadata fields. Using this collection we investigate the comparative search effectiveness of individual fields comprising automated transcriptions and the available metadata. A further important question is how transcriptions and metadata should be combined for the greatest benefit to search accuracy. We compare simple field merging of individual fields with the extended BM25 model for weighted field combination (BM25F). Results indicate that BM25F can produce improved search accuracy, but that it is currently important to set its parameters suitably using a suitable training set.
Metadata
Item Type:
Conference or Workshop Item (Paper)
Event Type:
Workshop
Refereed:
Yes
Additional Information:
Workshop held in conjunction with the 30th Annual International ACM SIGIR Conference 27 July 2007, Amsterdam
Uncontrolled Keywords:
searching spontaneous speech transcriptions; metadata; data
fusion; field combination;