Investigating cross-language speech retrieval for a spontaneous conversational speech collection
Inkpen, Diana, Alzghool, Muath, Jones, Gareth J.F.ORCID: 0000-0003-2923-8365 and Oard, Douglas W.
(2006)
Investigating cross-language speech retrieval for a spontaneous conversational speech collection.
In: HLT-NAACL 2006 - The Human Language Technology Conference - North American Chapter of the Association for Computational Linguistics Annual Meeting, 8-9 June 2006, New York, USA.
Cross-language retrieval of spontaneous speech combines the challenges of working with noisy automated transcription and language translation. The CLEF 2005 Cross-Language Speech Retrieval (CL-SR) task provides a standard test collection to investigate these challenges. We show that we can improve retrieval performance: by careful selection of the term weighting scheme; by decomposing automated transcripts into
phonetic substrings to help ameliorate transcription
errors; and by combining automatic transcriptions with manually-assigned metadata. We further show that topic translation with online machine translation resources
yields effective CL-SR.